As a follow on to last month's article titled "Application Mapping on IBM i: Present and Future," let's look at how you can use application mapping to actively change an entire system programmatically. Using the application map as a primary input, some simple reengineering concepts, and a fair amount of time to perfect, you can write programs to update application programs. This approach has saved many companies literally thousands of man-hours and millions of dollars.
The writing of programs to update your programs is typically used as a way to make structural changes to the application source, not functional changes. When a system enhancement produces a large number of fairly simple system-wide changes, programmatic automation of these changes begins to make sense. The most obvious example of this is Y2K. Some companies spent as much as five million dollars to change their systems for Y2K compliance. Some companies used programs to carry out the same amount of work on similar-sized systems for five percent of the cost. How did they do that, and why is this relevant nearly 10 years later?
After an application's life of 20 to 30 years, it's fairly safe to assume that there might be a business demand to change important and well-used fields in the database. This demand might be driven by industry standardization, system integration, upgrades, internationalization, or commercial growth (e.g., you run out of invoice numbers or even customer numbers).
Y2K affected almost every RPG application in existence. It also affected just about the entire application in each case. Since 2000, most systems have grown at a rate of 10 percent per year. It's a widely acknowledged fact that RPG resources haven't kept pace with this growth. In reality, they've probably reduced by the same amount each year. So although database changes are now generally industry or company specific, the problems and their related solutions remain the same—but with more code affected and fewer people to fix it.
There are several applications for automated reengineering of a system, which I briefly mention later in this article. Solving a field-expansion problem is, however, relevant for many companies, so I use it to flesh out the subject of this article in more detail.
A more conventional approach to solving a field-expansion problem is to get a feel for the scope and size of the problem, understand clearly the requirements for the change, and then send one or many developers off to fix the problem one program at a time. Figure 1 illustrates this manual approach.
Many problems are associated with this approach. Here are a few:
The upside is of course that such an approach requires little preparation and little initial investment in time or money. It's also generally flexible and therefore useful for small projects. There is, however, a risk that humans were unable to identify all required changes and do so in a consistent manner across all programs that they changed. As the size of the system increases, the risk of failure increases exponentially.
The basis of an engineered approach is to break down the process into a set of discrete, repeatable, and automated steps. Each step is then applied across the entire system or project and repeated until an optimum result is achieved. Figure 2 shows diagrammatically how this approach compares to a conventional manual approach.
Many benefits are associated with a structured and engineered approach. Some of these include:
Without an explicit, detailed, and very precise measurement of the impact of a database change across the system, automating the required changes would be impossible. Let's start by looking at this task in more detail.
Even in well-designed and well-documented systems, the impact of changing the database of an integrated and complex application on IBM i can be huge. Just the recompilation and data copying tasks can create logistical nightmares. The most significant and difficult task is of course measuring the impact on source code across the entire system. If analysis is done right, subsequent work will be highly predictable and measurable. If analysis is done incorrectly, the results could be catastrophic. Overruns in project timelines are just one possible impact, and I don't think I need to elucidate the potential outcome of having "missed" something in a production system.
Specifying fields to be changed. The first task in the analysis stage is to specify which fields need changing in the database. This task should be straightforward but may be complicated by virtue of integrated systems, poor documentation, or often a combination of both. The next step, which I describe in a moment, may actually produce results that warrant additional fields being added and included in the process.
Finding where fields are used. The next step is to establish precisely where these fields are used throughout the system. This is where things start to get tricky. Establishing the explicit use of a given field by its name can be achieved with a simple Find String Using PDM (FNDSTRPDM) command. You then need to start at these specific points and establish where these fields are associated with any other variable or data construct, by virtue of a compute or definition statement. There's only one way to do this, and that's to read the source code of every single instance in which the field being changed is used. RPG applications have many technical constructs that make this type of analysis complex and time consuming. For example:
Legacy cross-reference tools can help with this analysis up to a point. That point ends at each level or instance of a variable. So many individual queries—sometimes thousands—need to be run and amalgamated when using these older technologies. Figure 3 shows a simple example of conventional approaches being used to analyze tracing the CUSNO field.
The obvious answer to this problem is to prebuild an application map of the entire system being analyzed, where variable-field-variable associations are instantly available. Using this map, you can write a program that traces a field throughout all its iterations and variants across the entire system in a single query. Some of the trace work is accomplished in a previous stage in the form of prebuilding the application map. Let's look at an example of this at work.
Figure 4 shows the source of a CLP named CUSLET. If I were to carry out a traditional analysis on a system with this program in it, looking for the impact of a change to the field CUSNO, this program wouldn't show in the results.
Figure 5, however, shows a snippet of the source of an RPG program that calls CLP CUSLET passing the parameter CUSNO. Figure 6 shows the spreadsheet of the results of our extraction program written over the application map, and we can see that CUSLET has been included in the analysis results. This is because the parameter CUSNO was passed to CUSLET from the RPG program displayed in Figure 5.
The output of this analysis is a specific list of all source members and lines therein that are affected by the proposed field changes.
Making the required changes programmatically. Changes that can be made without causing any conflicts can be done programmatically. The percentage of these against the total changes required may vary from project to project, but essentially this task can be fully automated with a carefully written program. The tedious and time-consuming part of writing a program to do this is accounting for all instances or specific types of change. Nevertheless, these programmatic changes can provide a significant productivity gain in any project. There are different standards that can be used to notate and make the changes, such as making comments in margins, commenting out replaced code, or just overwriting existing code. This can be done one way during iterative trial conversions and then changed for a production conversion with little effort.
Note: It may be desirable to retain the original code as comments during the project but remove it prior to the final production implementation.
These programmatic changes can be categorized into two types:
Direct Definition Changes: Direct definition changes can be made where database fields or variables that can be traced back to database fields are defined. This includes files, displays, reports, and programs (RPG or CL) and refers to D-specs, arrays, and in-line calc specs, amongst others. This type of change is straightforward and is the most obvious candidate for programmatic change. Figure 7 shows the source of a physical file that has been programmatically updated and has had the original code commented out. Columns 1―5 have had the programmer's name added for audit purposes.
Indirect Definition Changes: In some cases, direct definition changes have a "knock-on" effect. For example, if a field is expanded by two digits, and this field is used before the end of an internal data structure in an RPG program, the other elements in the data structure must be adjusted to accommodate this change. Similarly in a print file format, a column-size increase may require columns to the right to be shifted to make space. In some cases this "knock-on" effect may actually cause conflicts of various types. These conflicts might be resolved by using clever algorithms in the programs that make the changes, but usually conflicts require human intervention. Figure 8 shows an example of how the data structure definition is adjusted, the second element is expanded, and subsequent elements are moved to accommodate this change. This type of change is fairly straightforward to program into the automated process. The time-consuming part is finding and allowing for all different types of patterns of instances in a system. As such, the repeated use and fine-tuning of programs that make changes to programs makes them naturally more useful with each successive project.
Managing design conflicts and manual intervention. In virtually every field-expansion project, there will be design problems that arise from the proposed changes. These might vary from a simple overlay or overflow on a report to embedded business logic based on a field substring. Although it may be impossible to automatically make changes to these constructs, it's possible to programmatically identify where they occur. Again, the role of the prebuilt application map is critical to this process as a primary input to the search algorithms. These conflicts can be clearly identified by subtracting the changes made programmatically from the total required changes. These conflicts can be generally categorized as follows:
Device Problems: Device problems are those in which any direct change or shuffling of affected columns runs out of space.
Program Problems: An example of a program problem is lines where there may be a conversion problem because a resized field (or a field dependent upon it) is a subfield in a structure that can't be resized. Another example is when a work field is used in a program by two fields at various stages. One field is being resized, and the other isn't. Again this requires design logic to resolve.
Database Problems: The whole process of solving a field-expansion problem starts by specifying which fields will be changed. The where-used analysis, when run on resized database fields, might trace to fields not included in the resize exercise. This may or may not be a problem but generally must be assessed manually.
Some of these problems might be resolved by making some manual changes before rerunning the analysis and programmatic changes. In certain cases, this process might have an exponential effect of removing problems with a conversion project. In other cases it will be necessary to make these design decisions and changes after the completion of the programmatic changes. The objective of this stage is an optimum result combining programmatic changes with whatever manual intervention is deemed necessary.
The automated nature of this process allows for the latest version of the source code to be brought in and run through the first three stages. It's also only at this stage that formal software configuration management (SCM) policies and procedures need to be implemented.
In many cases, no conversion or change will take place, but a recompile will be needed. Again the application map can be used to good effect here. Simply building recompile lists based on the converted source code and all related objects from the where-used information will help ensure that nothing is missed. It also means that simple CL programs can be written to bulk recompile and incorporate any compilation strings in the compile commands.
Structural changes to an application can be a key part of a company's modernization strategy. Some of these structural changes are motivated by more strategic objectives, such as agile development, reusable architecture, and functional redesign. Other modernization projects are driven by more commercial demands, such as internationalization.
Unicode conversions. An increasingly popular modernization requirement on IBM i is Unicode conversion. The principle of a Unicode conversion is largely the same as that of a field-expansion project: changing the attributes of database and display fields and updating all affected logic in the programs. There are some differences in the process and requirements, but the same approach can generally be followed. Indeed the same programs used for field expansion can be enhanced to accommodate for Unicode conversions without too much work involved.
Let's look at some simple examples of what could be changed programmatically with a Unicode conversion. The first aspect is updating the fields in the files and displays. This sort of change is consistent with the field-expansion algorithms mentioned earlier in this article. Figure 9 shows how the field definition for the COMPANY field has been updated to a type G, and the desired Unicode Coded Character Set Identifier (CCSID) has been specified in the function column for this field.
Figure 10 shows how the H-spec of an RPGLE program has been automatically updated with the requisite CCSID code. In this instance, the CCSID H-spec keyword is used to set the default UCS-2 CCSID for RPGLE modules and programs. These defaults are used for literals, compile-time data, program-described input and output fields, and data definitions that don't have the CCSID keyword coded.
Figure 11 shows how, by using a fairly straightforward algorithm, your automated program can intervene in your C-specs and automatically convert statements to include the %UCS built-in function (BIF) where required. In this example, as with the field-expansion samples, old lines have been commented out to show how the programmatically created new line has been changed.
There are two important points to make regarding Unicode conversions:
Externalizing database I/O. Another increasing trend in the IBM i application space is the need to separate out I/O logic from legacy programs. One primary motivation for this trend is the necessity for making significant changes to database architecture without interrupting proven process and business logic.
Another business driver for this trend is from companies replacing legacy custom software with off-the-shelf applications but wanting to keep certain core functions running as is, at least for a period of time. In this scenario, mapping to the replacement database architecture can be carried out without interruption to critical legacy functions, provided of course that the database I/O has been externalized from the legacy programs first.
The algorithms used by programs that would automatically make such a change would be different from a field-expansion process, but once again the core asset here would be the application map for the initial analysis. These reengineering programs can then be designed to identify and convert all source code instructions needed to transfer file I/O into external modules, giving identical functionality.
Thus the code in Figure 12 shows how an I/O statement is replaced with a procedure.
Another requirement of the reengineering programs is to automatically build fully functional I/O modules, which can then be adapted to a radically changed database, with no impact on the reengineered RPG code—the module returns a buffer identical to the original file. So if you wanted to switch to a completely new customer file, you could simply change the I/O module code (as shown in Figure 13), and the hundreds of RPG programs using the CUSTS file would require no source changes whatsoever!
Refactoring monolithic code into services. Another important way of using programs to update programs is in the area of building services from legacy application code. There are many articles and guidelines from leading thinkers, such as Jon Paris, Susan Gantner, and others, on the subject of using subprocedures over subroutines. This is fine for new applications, but most interactive legacy programs are written in a monolithic style, which can severely limit long-term modernization opportunities, not to mention add significant stress and complexity to ongoing maintenance and development tasks in general.
By advancing the algorithms of replacement and code regeneration described in all three areas here, it's possible to refactor monolithic programs by externalizing the subroutines into procedures automatically. Breaking up the program into two components like this makes the rewrite of the user interface layer easier but simultaneously makes available externalized subprocedures as callable services. This is a great way to start on a staged application reengineering while realizing immediate benefit.
Figure 14 shows two subroutines, VALID1 and VALID2, being invoked in a monolithic legacy program called WWCSUSTS. A "reworked" program was written, using similar logic to the field-expansion and I/O externalization programs mentioned earlier, to create a new Business Logic module that would contain procedures created from all the legacy subroutines in the original programs. Figure 15 shows the definition for the wwcustsvalid1 in this new ILE module WWCUSTSB.
The reworked program updated the original program to use the service program WWCUSTSB invoking the appropriate procedure as opposed to subroutine and passing the correct parameters. The reworked program also created the necessary prototypes in the updated WWCUSTS program, as Figure 16 shows.
Using programs to update programs is not a new or even an unusual technique. Combined with a very detailed application map of an entire system, this approach to system engineering can help solve the problem of modernizing and enhancing large and complex legacy applications using limited resources in shorter timeframes. For many companies, this approach has saved millions of dollars in development costs and has also provided a means to bring legacy application code into the world of modern architectures and techniques.
In the next article in this series, I look at how to extract design model assets from legacy systems. I cover areas such as relational data models and business rules and how these make legacy applications relevant in a modern context.
Robert Cancilla spent the past four years as market manager for IBM's Rational Enterprise Modernization tools and compilers group for IBM i. Prior to that, Robert spent 34 years as an IT executive for three major insurance companies and an insurance software house. He has written four books on e-business for the AS/400 or iSeries and founded and operated the electronic user group IGNITe/400. Robert is currently retired and does independent consulting work.
Here are two useful IBM Redbook publications that cover almost all the subjects discussed in this article:
Modernizing IBM eServer iSeries Application Data Access - A Roadmap Cornerstone
Modernizing and Improving the Maintainability of RPG Applications Using X-Analysis Version 5.6