Published on System iNetwork (http://systeminetwork.com)
Validate E-Mail Address with a Regular Expression
By tzura
Created Jul 9 2006 - 07:00

By:
Scott Klement [1]

Q: We need to catch errors during data entry when someone keys in an e-mail address. On the Web, I've found regular expressions that can be used for this. How can I use them in RPG?

A: ILE C comes with routines that can be used to compile and run a regular expression. Because one ILE language can call another's routines, you can use these from RPG to execute a regular expression. In this article, I demonstrate how to write an RPG program that uses a regular expression to validate an e-mail address.

No method of checking an e-mail address is foolproof. There are very few restrictions on what is and isn't allowed in an e-mail address, and that makes it very difficult to determine whether an address is right or wrong.

Back in November 2003, I published a utility that used DNS to validate an e-mail address. You can find that tool at the following link:
http://www.iseriesnetwork.com/article.cfm?id=17633 [2]

The problem with using DNS to validate e-mail addresses is that if there are any network errors, either on your network or on the one that hosts the DNS for the address you're verifying, the address comes up as "bad." For example, if a data-entry person keyed in bill@example.com [3], and example.com's Internet connection happened to be down for a few minutes, the address would come up as invalid because the DNS lookup failed!

In addition to potential problems with the network going down, you might also run into performance problems. A DNS lookup can sometimes take several seconds to complete.

The regular expression approach solves these problems because it checks only the format of the e-mail address and doesn't do any sort of network access. If all you want to do is catch keying errors, this approach might be best for you.

There are some caveats with most regular expressions used for validating e-mail, including the one that I'm about to demonstrate. Please read the following Web site for details about this regular expression, including the caveats:
http://www.regular-expressions.info/email.html [4]

The ILE C runtime library contains the following APIs that I use in my program to validate an e-mail address:

regcomp()  -- compile a regular expression
regexec()  -- execute a compiled regular expression
regfree()  -- free up memory used by a compiled regular expression
regerror() -- retrieve error information when regcomp() fails

I created a copy book, which I named REGEX_H, and it contains the prototypes and other definitions that I need to use when working with regular expressions.

The first step is to compile the regular expression. This creates a compiled form in memory that can subsequently be used to execute it. After it has been compiled, it executes very quickly. Here's the code that compiles the e-mail address validation expression:

        if (not Compiled);
            pattern = '^[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$';
            rc = regcomp( reg
                        : Pattern
                        : REG_EXTENDED + REG_ICASE + REG_NOSUB );
            if rc <> 0;
               // handle error.
            endif;
            compiled = *on;
        endif;

The regular expression is stored in the variable named pattern. I pass it to the regcomp() API, and it compiles it and stores the compiled result in the reg variable. The third parameter to regcomp() contains options that tell it a little bit about my regular expression.

There are two types of regular expressions. The original ones are called BASIC regular expressions, and the newer ones are called EXTENDED. In this case, I'm using an extended regular expression, so I pass REG_EXTENDED to the API. For information about how regular expressions work, and the difference between BASIC and EXTENDED, please see the following link:
http://en.wikipedia.org/wiki/Regular_expression [5]

The REG_ICASE flag tells regcomp() that the regular expression should be case-insensitive. In other words, whether the letters are in upper case or lower case, they'll match the expression the same way.

Regular expressions can be used to pull substrings out of a larger string. However, this functionality isn't useful for verifying an e-mail address, so I specify REG_NOSUB to tell the API that it doesn't need to return substrings. This improves performance a bit.

After my regular expression is compiled, I can use it over and over again. Consequently, I use the compiled variable as a flag so that it compiles the regular expression only on the first call, and thereafter it reuses the one that I already compiled.

Now that my regular expression has been compiled, I can execute it against the e-mail address to check the address:

        if (regexec( reg
                   : %trim(EmailAddr)
                   : 0
                   : match
                   : 0 ) = 0);
           valid = *on;
        else;
           valid = *off;
        endif;

The first two parameters are the compiled regular expression and the e-mail address to check. The last three parameters are for returning matching substrings. Because I compiled my regular expression with REG_NOSUB, those parameters are ignored.

In the preceding example, I set an indicator named valid to *ON if the e-mail address matches the regular expression, and to *OFF if it does not.

I've written a sample program that demonstrates validating an e-mail address. It receives the e-mail address in the first parameter and passes back the valid indicator in the second parameter, so you can call it from other programs when those programs need to validate addresses. You can download my sample program from the following link:
http://www.pentontech.com/IBMContent/Documents/article/52826_80_MailChk.zip [6]

Note: Regular expressions use some characters that do not translate well from character set to character set. The sample program that I provide works nicely if your system is running CCSID 37. However, if you're using another character set, you might encounter problems. One simple workaround is to use the iconv() API to translate the pattern and EmailAddr from your native character set to CCSID 37 before using the regcomp() and regexec() APIs. For more information about calling the iconv() API from an RPG program, see the following article from the June 29, 2006, issue of this newsletter:
http://www.iseriesnetwork.com/article.cfm?id=52786 [7]

Copyright © Penton Media

Source URL: http://systeminetwork.com/article/validate-e-mail-address-regular-expression

Links:
[1] http://systeminetwork.com/author/scott-klement
[2] http://www.iseriesnetwork.com/article.cfm?id=17633
[3] mailto:bill@example.com
[4] http://www.regular-expressions.info/email.html
[5] http://en.wikipedia.org/wiki/Regular_expression
[6] http://www.pentontech.com/IBMContent/Documents/article/52826_80_MailChk.zip
[7] http://www.iseriesnetwork.com/article.cfm?id=52786