The Expat XML parser is an open source XML stream parser that was written in C. This parser works so well, and is so efficient, that it has been incorporated into quite a few other open source projects. For example, the Mozilla and Firefox Web browsers use Expat to parse XML documents. This article will introduce you to some of the basic concepts of calling Expat from an RPG program. I will write articles in future newsletters that take it a step further, explaining Expat more fully.
Expat is a service program written in C. It's a stream-oriented parser, similar in many ways to SAX, but perhaps a little lower-level. You write code that reads data from an XML document, usually stored in a stream file in the IFS, and you pass that data in chunks to Expat.
I ported Expat to the iSeries so that I could use it in my own programs. I've created a /COPY member called EXPAT_H that contains the prototypes for the Expat procedures that I use. However, it should be noted that Expat was written for ASCII computers. The XML data that you send to it must also be in ASCII! Since the XML stream files are usually ASCII files stored in the IFS, that's usually not a problem.
To get started, you'll need to use the IFS APIs to open one such document. Note that you want to open the document only for reading. You do not want to have the open() API automatically convert the document, because you want it to remain in ASCII and not have it translated to EBCDIC.
fd = open('/tmp/testdoc.xml': O_RDONLY);
if (fd < 0);
// handle error.
endif;
Once you've successfully opened the document, you need to create a spot in memory where Expat can keep track of where it left off parsing your document, what its current status is, and so on. To do that, you call the XML_ParserCreate() subprocedure in the EXPAT service program. This subprocedure accepts one parameter that's used to identify the encoding of the XML document. You can pass *NULL for this parameter to have it take the default encoding. For example, you can call XML_ParserCreate() as follows:
. . .
D p s like(XML_Parser)
/free
. . .
p = XML_ParserCreate(*NULL);
if (p = *NULL);
// error! Expat couldn't allocate memory to use to
// keep track of this XML document.
endif;
. . .
What's needed next is to pass the document into XML so that it can parse it. To do that, you should read the stream file one chunk at a time and pass the chunk to Expat. Continue to do that, in a loop, until you've reached the end of the XML document.
dou (done = 1);
len = read(fd: %addr(Buff): %size(Buff));
if (len < 1);
done = 1;
endif;
if (XML_Parse(p: Buff: len: done) = XML_STATUS_ERROR);
callp close(fd);
errormsg = 'Parse error at line '
+ %char(XML_GetCurrentLineNumber(p)) + ': '
+ %str(XML_ErrorString(XML_GetErrorCode(p)));
// show error message to user, and exit...
endif;
enddo;
Once you've finished reading the document, the memory that was reserved for keeping track of things is no longer needed. You should call the XML_ParserFree() subprocedure in Expat to free up the memory. The following code snippet frees up the memory and closes the stream file:
XML_ParserFree(p);
callp close(fd);
Now that you know how to feed your document to Expat, you'll want to know how to get some results back. To get results, you have to register "handler" subprocedures. The following code demonstrates registering two subprocedures, one called start() and the other called end():
XML_SetStartElementHandler(p: %paddr(start));
XML_SetEndElementHandler(p: %paddr(end));
These have to be set in your code before you call XML_Parse() for the first time. As Expat parses the XML document, it will call the start() subprocedure each time it encounters a start tag, and it'll call the end() subprocedure for each ending XML tag. For example, consider the following XML document:
<?xml version="1.0"?>
<invoice id="54-12343">
<ShipTo>
<name>Scott Klement</name>
<address type="residence">
<addrLine1>123 Sesame St</addrLine1>
<city>New York</city>
<state>NY</state>
<zipCode>54321</zipCode>
</address>
</ShipTo>
. . .
</invoice>
As Expat receives this document, it'll first see the start of the <invoice> tag. When it sees that, it'll call the start() subprocedure, since that's registered as the handler for the start of each XML tag. After that subprocedure completes, Expat will read in the <ShipTo> tag and once again call the start element handler. It will proceed to call the start element handler for the <name> tag.
Since no subprocedure has been registered for character data, Expat will skip over the text where it says "Scott Klement" and will read in the </name> tag. Since this is an ending element, it will call the end() subprocedure, since that's the one that's been registered to handle ending elements.
It will continue through the entire document in this fashion, calling the start and end handlers as each XML tag is found in the document.
To demonstrate this further, I've written a program called OUTLINE that uses start and end element handlers to make an outline of the XML document. In a future newsletter, I will take this a step further and show you how to handle the character data as well.
You can download my OUTLINE sample program from the iSeries Network Web site at the following link: http://iseries.pentontech.com/t?ctl=563F:103C2
You can download the source code for the iSeries port of the Expat XML parser from my Web site at the following link: http://iseries.pentontech.com/t?ctl=5654:103C2
The home page of the Expat open source project is http://iseries.pentontech.com/t?ctl=5656:103C2 .