I'm frequently asked how to translate data from EBCDIC to ASCII, or EBCDIC to Unicode, or between the character sets used by different cultures. In most cases, the best solution to these translations is the iconv() API.
The key to using iconv() on the iSeries is knowing which Coded Character Set Identifiers (CCSIDs) you need to translate between. A CCSID is a number that identifies a character set that has been encoded a particular way. For example, to identify the character set used in the U.S. when encoded in single-byte EBCDIC, we can refer to CCSID 37. The number 37 is just an identifying number that IBM assigns to that particular character set when it's encoded as EBCDIC so that when the time comes to translate to or from U.S. EBCDIC, all we need to specify is 37 for the CCSID parameter.
To get started with iconv(), you have to open a "conversion descriptor." That's a technical way of saying that the system needs to find the right translation table and reserve some memory for work variables that it uses internally. To do that, you pass the CCSIDs to the QtqIconvOpen() API. It takes care of finding the right table, loading it into memory for quick access, and reserving memory for the internal work variables. Here's an example of opening a conversion descriptor:
/copy iconv_h
D from ds likeds(QtqCode_T)
D inz(*LIKEDS)
D to ds likeds(QtqCode_T)
D inz(*LIKEDS)
D table ds likeds(iconv_t)
/free
from.CCSID = 37;
to.CCSID = 819;
table = QtqIconvOpen(to: from);
if (table.return_value = -1);
errMsg = 'Unable to load translation table';
// FIXME: show message to user.
*inlr = *on;
return;
endif;
To make calling iconv() as simple as possible, I put all the definitions that I need in the ICONV_H source member, and I use the /COPY directive to bring those definitions into each program that uses iconv(). In the preceding code, "from" and "to" are copies of a data structure stored in the ICONV_H member. The only field in that data structure that I need to change is the CCSID field, so that I can tell the API which CCSIDs to convert between. I pass the data structures containing these CCSIDs to QtqIconvOpen(), and it finds the table, reserves memory for work variables, and returns a conversion descriptor. The descriptor is actually a data structure also defined in ICONV_H. It contains a subfield called return_value, and I can check that subfield to verify that QtqIconvOpen() completed successfully.
Now that I have the translation table loaded, I can pass it to the iconv() API to translate some data. The prototype for the iconv() API is defined in ICONV_H as well. This is what the prototype looks like:
d iconv PR 10I 0 extproc('iconv')
d cd like(iconv_t) value
d inbuf *
d inbytesleft 10U 0
d outbuf *
d outbytesleft 10U 0
The first parameter to iconv() is the conversion descriptor. The remaining parameters are a pointer to the next character to convert, the number of characters left to convert, a pointer to the memory where the translated character should be stored, and the amount of memory that remains for converted characters.
Iconv() reads your input data one character at a time and converts it to an output character. After that character is translated, it changes the pointers to point to the next character to be translated and decreases the bytes left for the input and output buffers. It continues doing this in a loop (converting each character and updating the parameters) until it runs out of characters to translate, runs out of space in the output buffer, or finds a character that it can't translate.
Because the pointers and space left fields are updated as iconv() runs, if an error occurs, you can call iconv() back, and it picks up where it left off.
Here's an example of translating a string from EBCDIC to ASCII using the conversion descriptor from the preceding code snippet:
D p_input s *
D inleft s 10U 0
D p_output s *
D outleft s 10U 0
D input_data1 s 50A
D output_data1 s 200A
.
.
input_data1 = 'Hello, my name is Scott';
output_data1 = *blanks;
p_input = %addr(input_data1);
inleft = %len(input_data1);
p_output = %addr(output_data1);
outleft = %size(output_data1);
iconv( table
: p_input
: inleft
: p_output
: outleft );
In the preceding code snippet, I start by pointing the input and output pointers to variables in my program. I set up the "bytes left" fields to be the length of the data to translate and the amount of memory to receive the results. I then call iconv() to perform the translation.
Translating data stored in a VARYING string is a little more complicated because the API doesn't know anything about VARYING. You see, VARYING is an RPG concept in which a character string is prefixed by a two-byte field containing the length of the string. Because the API is unfamiliar with VARYING, we can skip those two bytes (by adding two to the pointer), and it translates the character data without even knowing that the string is VARYING. Here's an example of using iconv() with a VARYING string:
D input_data2 s 50A varying
D output_data2 s 200A varying
.
.
input_data2 = 'Goodbye, it was nice meeting you!';
%len(output_data2) = %size(output_data2) - 2;
p_input = %addr(input_data2) + 2;
inleft = %len(input_data2);
p_output = %addr(output_data2) + 2;
outleft = %len(output_data2);
iconv( table
: p_input
: inleft
: p_output
: outleft );
%len(output_data2) = %len(output_data2) - outleft;
Because a pointer points to a particular byte in memory, adding two to that pointer points two bytes later in memory. Therefore, it effectively skips over the length that's prefixed to the VARYING fields.
Because the output variable is also VARYING, I set its length to the maximum length that can be stored in the field before the conversion. After the conversion is complete, I use the "bytets left" field to determine how much data was actually placed in the output field, and I adjust the length accordingly.
You can use the same conversion descriptor to translate as many strings as you like. When you're done converting data with iconv(), you should call the iconv_close() API. This lets the system free up the memory for its internal work variables so that the memory is available for other tasks.
Here's an example of calling iconv_close():
iconv_close(table);
Instead of specifying a CCSID when you call QtqIconvOpen(), you can specify a special value of zero. If you specify zero, it tells iconv() that you'd like to use the default CCSID for the current job. For example, instead of hard-coding 37 for the EBCDIC CCSID of my data in the previous examples, I could've specified zero as follows:
/copy iconv_h
D from ds likeds(QtqCode_T)
D inz(*LIKEDS)
D to ds likeds(QtqCode_T)
D inz(*LIKEDS)
D table ds likeds(iconv_t)
/free
from.CCSID = 0;
to.CCSID = 819;
table = QtqIconvOpen(to: from);
if (table.return_value = -1);
errMsg = 'Unable to load translation table';
// FIXME: show message to user.
*inlr = *on;
return;
endif;
Using the job's default CCSID is especially useful when the data that you translate is data that the user keyed in. It saves you the effort of trying to figure out what CCSID the user's data will be. Assuming that the job's CCSID was set up properly, it'll be the correct one for the data that the user types.
I've put the code examples from this article, and all the definitions that I use with iconv(), into a zip file that you can download. It's available at the following link:
http://www.pentontech.com/IBMContent/Documents/article/52786_78_IconvDemo.zip
You'll find IBM's documentation for iconv() and related APIs in the Information Center. These APIs are part of the Code Conversion subcategory of the National Language Support APIs category. Here's a link to that section of the Information Center:
http://publib.boulder.ibm.com/infocenter/iseries/v5r3/topic/apis/nls3.htm