Using the Expat XML Parser from an RPG Program, Part 4

Article ID: 50719

In previous issues of this newsletter, I demonstrated how to use the Expat XML parser from an RPG program. In this issue, I'll try to answer some of the questions that I've received from readers about the Expat utility.

Q: Does Expat have a way of sending XML tag attributes to my program? Consider the following example. I need to be able to get the comment from the ORDER tag as well as the ID, status and type from the DOCUMENT tag. Can it be done?

<Order comment="Special Order">
  <document id="16555" status="Copy" type="850" language="EN">
    <creationDate>2005-01-28</creationDate>
    <creationTime>09:12:41</creationTime>
  </document>
</Order>

A: Expat will pass the attributes as an array of pointers to your start element handler. As I discussed in the first article of this series, the start element handler's prototype looks like this:

     D start           PR
     D   d                                 likeds(mydata)
     D   elem                          *   value
     D   attr                          *   dim(32767) options(*varsize)

The third parameter is an array that contains the attributes of the XML tag that Expat has just finished parsing. The first pointer in the array will point to a null-terminated string that contains the first attribute name. The second pointer will point to the value of the first attribute. The third will point to the second attribute name, and the fourth will point to the second attribute's value. This continues until one of them contains the *NULL special value. When you receive a *NULL, you know that you've reached the end of the attribute list.

Using your DOCUMENT tag, above, as an example, this is what the start element handler would receive in its parameters:

     elem    points to 'document'
     attr(1) points to 'id'
     attr(2) points to '16555'
     attr(3) points to 'status'  
     attr(4) points to 'Copy'
     attr(5) points to 'type'
     attr(6) points to '850'
     attr(7) points to 'language'
     attr(8) points to 'EN'
     attr(9) is *NULL

Each string that the attr array points to is a variable-length string that ends with the x'00' character. RPG provides the %str() BIF that accepts a pointer as its parameter, and it'll convert the data located at that pointer to an RPG VARYING field. So, for example, to convert the data in attr(1) to an RPG field, you'd run the following code:

     attrname = %str(attr(1));

Additionally, that attribute name is still in ASCII or UTF-8, or whatever the document was encoded with. You'll need to use the iconv() API, as described in part 4 of this series, to convert that to EBCDIC. For example:

     D p_input         s               *
     D inputlen        s             10U 0
     D p_output        s               *
     D outputlen       s             10U 0
      . 
      .
            AttrName = *blanks;
            p_input = attr(1);
            inputlen = %len(%str(attr(1)));

            p_output = %addr(AttrName);
            outputlen = %size(attrName);

            iconv(d.ic: p_input: inputlen: p_output: outputlen);

After this code runs, the AttrName field will contain an EBCDIC representation of the attribute name. You'll have to do the same thing for the attribute value.

The XLATEICONV sample program from part 4 of this series demonstrated looping through each attribute, converting that attribute to an EBCDIC value, and then trying to map the value to a variable in your RPG program. Here's a code snippet from that program that illustrates the process:

        x = 1;
         dow attr(x) <> *NULL;

            AttrName = *blanks;
            p_input = attr(x);
            inputlen = %len(%str(attr(x)));
            p_output = %addr(AttrName);
            outputlen = %size(attrName);
            iconv(d.ic: p_input: inputlen: p_output: outputlen);

            AttrVal = *blanks;
            p_input = attr(x+1);
            inputlen = %len(%str(attr(x+1)));
            p_output = %addr(AttrVal);
            outputlen = %size(AttrVal);
            iconv(d.ic: p_input: inputlen: p_output: outputlen);

            select;
            when d.stack(d.depth) = '/invoice'
                 and attrName = 'id';
               d.id = attrVal;

            when d.stack(d.depth) = '/invoice/ShipTo/address'
                 and attrName = 'type';
               d.shipto.type = attrVal;

            when d.stack(d.depth) = '/invoice/BillTo/address'
                 and attrName = 'type';
               d.billto.type = attrVal;
            endsl;

            x = x + 2;
         enddo;

Q: My character data handler only receives part of the character data. What's wrong?

A: Remember that to feed your XML document to Expat, you call the XML_Parse() routine in a loop. On each call to XML_Parse(), you pass a buffer full of data. Since you don't load the entire XML document into memory at once, you can handle very large documents without consuming huge amounts of memory.

By the same token, Expat doesn't want to save all of that character data in its memory until it reaches the end of an XML element. If it did, a very large XML element would, again, consume huge amounts of memory.

To keep the memory usage relatively small, Expat may call your character data handler many times, just as you called its XML_Parse() routine many times! If you need to keep all of the data for a single element in one variable, you should keep concatenating the data until your ending element handler has been called.

I've written a sample program called FULLELEM that does this. If you download the ZIP file containing the source code for this article, you can try it out.

Q: Is it possible to work with character data that's larger than 64k?

A: Yes. Expat will never pass your character data handler more data that you gave to it when you called XML_parse(). To handle data larger than that buffer, it will call your character data handler many times. See the previous question for a discussion of this.

This behavior makes it possible to handle data of any size. Since it's not handled all at once, it's given to your program one chunk at a time. You can process that chunk, and then wait for more.

For example, in your start element handler for a particularly large element, you might want to create a temporary stream file. In the character data handler, you could write any data that you receive to that stream file. In the ending element handler, you'd close the stream file and possibly run a program that knows how to handle the data you just saved to the file.

Frequently, when there's large amounts of data in a single element like this, the data will be base64 encoded. If you save the base64 encoded data to a temporary stream file, you can run it through a base64 decoding routine when you reach the end data handler and save the result to a more permanent file in the IFS.

Information on how to create a temporary file in the IFS was covered in the January 27, 2005, issue of this newsletter. You can find that on the Web at the following link:
http://www.iseriesnetwork.com/provipcenter/index.cfm?fuseaction=ShowNewsletterIssue&ID=19990

You can find a service program for decoding base64 documents on my Web site at the following link:
http://www.scottklement.com/base64/

Q: I'm having trouble getting your sample programs to work on a V5R1 system. Can you help?

V5R1 supports the use of the LIKEDS keyword to create a data structure that's like another data structure. However, it does not let you nest data structures or to create an array of them. In order to do that, you'll need V5R2 or later.

You could potentially convert my sample programs to V5R1-compatible code by eliminating these nested data structures and arrays of data structures.

The source code download for this week's article is V5R1 compatible, however. The FULLELEM sample program demonstrates most, if not all, of the techniques that I've shown you over the past few issues of this newsletter. Take a look and see if it answers your questions.

You can download the code for this article from the iSeries Network's Web site at the following link:
http://www.pentontech.com/IBMContent/Documents/article/50719_16_ExpatFullElem.zip

The source code for my iSeries port of the Expat XML parser can be found on my Web site at the following link:
http://www.scottklement.com/expat/

ProVIP Sponsors

ProVIP Sponsors