RPG and the IFS: Text Files in the World

Article ID: 19626

Click here to download code
This is the third article in the RPG and the IFS series. You may recall that in the first article, we examined the basics of stream files — how to open them and how to read and write from them, and that in the second article I explained a particular style of stream files called text files (see "Other Articles in This Series," below, for a list of these articles).

In this article, I take a slight detour from explaining concepts and theories. Instead, I explain how the integrated file system (IFS) can benefit your business, including some samples of how you can use text files in today's business world.

IFS Advantages

One of the first things that people ask me when I start teaching them about the IFS is, "Why not just use CPYFRMIMPF or CPYFRMSTMF to copy the stream file to a physical file and read that? Wouldn't it be simpler?" I don't believe that it's simpler, but I agree that it's more familiar to most RPG programmers. Here are some reasons that working with IFS files directly is a better solution:

Compatibility with more file formats. Although commands such as CPYFRMIMPF (Copy From Import File) and CPYFRMSTMF (Copy From Stream File) allow several different file formats, there are many formats that such commands don't understand. If you write your own program to read the file, you have complete control over how the file is interpreted, giving you control over which formats you do and do not support.

Speed. Suppose that a business partner gives you a million-record stream file. You could copy it to a physical file and then read that physical file, but it would take much longer than reading the stream file directly.

Space. In the preceding example, you'd eventually have both a stream file and a physical file, requiring a lot of extra disk space. Physical files usually require significantly more space than stream files because of the overhead that fixed-length records require.

Simpler file management. If you have a copy of a file on your Windows server, then transfer it to the iSeries, then convert it to a physical file, that's three copies! What happens if one of them gets changed while you're working with the others? Using the IFS, you can open the file on the Windows server directly, put a lock on it to prevent something else from changing it, and process it.

Error handling. It's easy to check for errors from the open() API! It's not as easy to figure out every error that can occur in the process of creating a temporary file, copying data to it, and then opening it.

Comma-Separated Values

Most database and spreadsheet applications can export their data in comma-separated value (CSV) format. Some programming languages, such as Visual Basic (VB), can read and write files directly in CSV format. CSV is a simple format for data interchange and is used worldwide in millions of applications.

A CSV file is a text file in which each line of text is broken up into fields. Each field is variable in length and is separated from other fields by commas. Because fields often contain commas, quotation marks are usually placed around a character field. Everything within those quotation marks is a single field. Figure 1 shows a sample record from a CSV file.

Every year around Christmas, my company sends out catalogs of the gift boxes that we sell. We send our mailing list to a printing company that runs our list through the U.S. Postal Service's National Change of Address (NCOA) database. The printing company then prints the addresses onto the catalogs and mails the catalogs.

After this process is complete, the printing company sends me a CSV file that contains the updates that it made to the mailing list. Sometimes, the company sends the list on a CD-ROM or a floppy disk. Other times, it e-mails the list or lets me download it from the company's FTP server.

I then use this file to update our iSeries database. To do this, I created a service program, CSVR4, that reads a CSV file. Figure 2 shows this service program. The prototypes for the subprocedures in this service program are placed in a separate member so I can easily /COPY them into RPG programs when I want to use them. (The /COPY member isn't printed in this article, because you can glean the prototypes from the procedure interfaces. However, you can download the full source code for everything in this article at iSeriesNetwork.com/code.)

The CSV_open() routine does two things. It uses the fopen() API to open the stream file (A in Figure 2), and it creates a data structure. This data structure contains all the information that pertains to this particular file, and it's passed to each subprocedure in the service program. In the /COPY member, I created a template for that data structure, and I always use the LIKEDS keyword to reference it. That way, I can maintain information about this file from routine to routine, and I can declare separate data structures for each CSV file that I want to read. With this approach, this service program can handle having many CSV files open at once.

Each record in a CSV file is a line from a text file. The CSV_loadrec() routine loads the next line from disk into the data structure (B in Figure 2) and strips the trailing carriage return and line-feed characters from the record (C in Figure 2).

The CSV_getfld() subprocedure examines the record that is now stored in the peCSV.buf variable (D in Figure 2), looking for commas and quotation marks. It copies the field data from the record into a parameter called peFldData so that the caller can use the data. The CSV_close() routine (E in Figure 2) is called to close the stream file when you're finished reading records from it.

Figure 3 shows a program that reads this CSV file and uses it to update my mailing list, which is a physical file. The LoadFields() subprocedure reads each field from the CSV record into a data structure (A in Figure 3). (I'm a big fan of data structures, if you haven't noticed!) The ProcessRec() subprocedure uses that data structure to update my mailing list (B in Figure 3). The program's mainline drives the entire process, opening the CSV file, looping through the records, and calling the subprocedures. Note that this sample program reads the NCOA file from my home directory on the iSeries. If I wanted the program to read from a CD-ROM, I'd change the path name so that it started with /QOPT instead of /home.

Writing an XML File

It seems that every day, people find a new use for XML and write new software that expects data to be in this format. XML is quickly becoming the primary method for data interchange between business applications.

Although parsing an XML file is complex and requires special software (DOM or SAX parsers, usually), writing an XML file is easy. Simply create a string that contains the data and XML tags that you need in your RPG program and write it to a stream file. BLAM! You have an XML document.

Figure 4 shows the DDS for a simple customer master file. I need to convert the data in this file into an XML file so that I can import it into an application that runs on a separate Web server. Figure 5 shows a sample of what the XML should look like when I'm finished. Figure 6 shows the RPG program that I wrote to get the job done. It loops through the records in the CUSTMAS file, adds XML tags between each field, and writes the result to the stream file (A in Figure 6).

Plain-Text E-mail Messages

E-mail messages are text files. Indeed, even when an e-mail message contains pictures, sound, or attachments, those files must be converted to something that follows a text file's rules before they can be sent. In this section, I demonstrate the process of sending shipment-notification e-mail messages to customers by reading some files that contain the orders that have been shipped, generating text files containing the e-mail message, and using the QtmmSendMail() API to send the messages. To use the sample programs that I describe in this section, you need to have the SMTP server configured and running properly on your iSeries.

Every e-mail message has two parts: the header and the body. The header consists of lines of text, each one containing a keyword and a value. A colon separates the keyword from the value. Typical keywords are From, To, Date, and Subject. A blank line (i.e., a line that consists of only CR and LF) signals that you've reached the end of the header. The rest of the e-mail document is the body. The body is a simple text file that contains the message for the recipient.

I've created a sample program, IFS3MAIL1, that notifies customers when their orders have shipped. Physical files SHPORDERS and SHPDETAIL store information about shipped orders. Figure 7 shows these files' layout. Once a day, I need to read these files, and for each order found, generate an e-mail message.

Figure 8 is the IFS3MAIL1 program, which reads these files. In the mainline, I simply read through the order list, and for each order found, I create an e-mail message in a text file by calling the CreateMsg() subprocedure, and then I send that text file by calling the SendMsg() subprocedure (A in Figure 8).

I use the QtmmSendMail() API to send the messages for me, because it's included with OS/400 and is relatively easy to use. This API requires that you have the TCP/IP Connectivity Utilities (5722-TC1) installed and have configured the OS/400 SMTP server. This API needs three different parameters to do its job: an ASCII text file in Coded Character Set Identifier (CCSID) 367, a sending e-mail address in CCSID 500, and a recipient e-mail address in CCSID 500. This API creates a background job to do the actual work of loading my stream file into the e-mail message queue, and it deletes the file when it's done with it.

The CreateMsg() subprocedure starts this process by creating a new stream file with a unique name. To create a unique name, I've passed the O_EXCL open flag to the open() API (B in Figure 8). O_EXCL means "exclusively create." In other words, the open() API fails if the file already exists. The API opens the file only if the file can be created as a new file. That way, I can add a number to the file name and call open() in a loop until it's able to successfully create a file with a unique name. To make sure that the file already exists error is the only one that causes my program to keep looping, I'm checking errno's value each time open() fails. I keep looping only if the EEXISTS error is the reason that the API failed.

Next, CreateMsg() writes the e-mail message's header. Last, it writes the message body and closes the file. Figure 9 shows a sample of a finished text file.

IBM provides the iconv() API to let you convert data from one CCSID to another. You don't need iconv() for writing a text file, because the open() API can convert things automatically for you. However, according to the documentation, the e-mail addresses need to be in CCSID 500 when calling the QtmmSendMail() API. This is the first thing that's done in the SendMsg() subprocedure. Next, SendMsg() creates a data structure for the recipient e-mail address, as QtmmSendMail requires. Note that this data structure is an array! Because an e-mail message can have multiple recipients, it would be relatively easy to add more elements to this array, one for each recipient. Finally, SendMsg() calls the QtmmSendMail() API to send the actual message.

The prototypes for the QtmmSendMail() API and for the ADDTO0100 data structure are defined in the /COPY member called SENDMAIL_H, which isn't pictured here but is included in the downloadable code available at iSeriesNetwork.com/code.

E-mail with Attachments

Sometimes, plain-text e-mail messages aren't enough. People want to send PDF files, Word documents, and picture and sound files through e-mail. These documents typically aren't text files, but they contain binary data. Once the requirement to send more than plain text had become obvious to engineers who devised the TCP/IP standards, it was too late! The standards said that the SMTP protocol would handle only text data, and it was in use in too many places to change it.

To solve the problem, engineers came up with the idea to use an algorithm called base64 encoding to convert binary data into text. Because an e-mail reader needed to be able to identify whether the data was encoded with base64 and whether the result was something it could display, new header keywords were required to describe the content type and the encoding method. Furthermore, because it's desirable to let a document contain both text and pictures, a message body needed to be divided into many parts.

The solution to these requirements is Multi-purpose Internet Mail Extensions (MIME). With MIME, a boundary string is included in a keyword in an e-mail message header. Then, in the message body, each time that boundary appears with two dashes as a prefix, a new part begins. When the boundary appears with two dashes both as a prefix and a suffix, it signals the end of the e-mail message.

Each MIME message part starts with the boundary and is immediately followed by a subheader that contains header keywords that give information about just that chunk of the message. Just like the main header of the e-mail message that I described in the preceding section, the subheader ends with a blank line.

Figure 10 shows a sample e-mail message that has MIME formatting and two parts. The first part is plain text, and the second part is base64 encoded.

How Base64 Encoding Works

A binary stream file, by definition, can consist of any byte values, not just the text characters that people can read. The problem is that there are 256 possible values to each byte, and less than half that many text characters. To encode a binary file as a text file, you have to reduce the number of possible values for each byte of data.

Base64 reduces the number of possible values by reading the stream file six bits at a time instead of eight bits at a time. Six bits have only 64 possible combinations, and there are more than 64 text characters. To encode the data as text, just assign one text character to each possible value from 0 to 63. Figure 11 shows the base64 alphabet, which is the table of numbers and their corresponding text characters.

Wait a minute! This is a neat idea, but computers aren't designed to work that way. You can't read six bits at a time from a stream file. You have to read in multiples of eight bits. In other words, one byte at a time is the smallest you can go.

With a little grade-school math, you come to the conclusion that the lowest common denominator of six and eight is 24. If you read the stream file three bytes at a time, you have 24 bits. Those 24 bits can be encoded into four text characters at six bits each. The diagram in Figure 12 illustrates this process.

Sample Program that Demonstrates Base64 and MIME

Figure 13 shows some snippets from sample program IFS3MAIL2, which demonstrates the techniques that I described in the preceding section. (The complete program is available for download, along with the other sample programs in this article, at iSeriesNetwork .com/code.)

In this program, I've changed the CreateMsg() subprocedure to accept the name of a file to be attached to the e-mail message (A in Figure 13). The routine starts by trying to open this file, because if the file's not found, it makes sense to quit early (B in Figure 13). The routine then creates a temporary file for the e-mail message, just as it did in the preceding example, but then it creates a boundary (C in Figure 13). I put the characters =.= in the boundary because it's impossible for these characters to appear in a base64-encoded attachment, and it's unlikely that they would appear in the message text. To further ensure that the boundary string doesn't appear in the message text, I've added the current timestamp from the system clock to the boundary string (D in Figure 13). That way, if by stroke of bad fortune, the boundary does happen to appear in the message text, the boundary will be different the next time!

This message's headers contain the text 'MIME-Version: 1.0' to indicate the MIME version that I've used and a line with the text 'Content-Type' to tell the e-mail software that the message has many parts and what the boundary for those parts is (D in Figure 13). The first part's header prepends two dashes to the boundary and then writes it and another content type to the text file. I've identified this part as 'text/plain', which means that it's plain text (E in Figure 13).

The next message part contains content that I've described as 'application/octet-stream' to any e-mail software that tries to interpret the message. The term "octet" means eight-bit byte, so this code is simply saying that it's a binary stream file that an application will understand. The header further says that during transfer, the data is encoded with base64 and should be treated as an attachment.

The program then reads through the attachment file and runs it through the base64 encoder, writing each chunk to the e-mail message. The program writes a final boundary at the end with two dashes both prefixing and suffixing the boundary. This tells the e-mail software that it has reached the end of the MIME-encoded message (F in Figure 13).

More to Come

I hope that this article has familiarized you with text files' usefulness. In the next installment of this series, I'll go into more depth about binary stream files and explain how to create them and what they're useful for.

Scott Klement is the editor of the Club Tech iSeries Programming Tips e-mail newsletter and a forum pro on the iSeries Network. He is also the IS manager at Klement Sausage Co., Inc. You can e-mail Scott at iSN@ScottKlement.com.


Other Articles in This Series

You'll find the following articles in the RPG and the IFS series at iSeriesNetwork.com:

Introduction to Stream Files, November 2004 and article ID 19312

A Text File Primer, December 2004 and article ID 19473

ProVIP Sponsors

ProVIP Sponsors