Application Mapping on IBM i: Present and Future

New technologies and concepts augment IBM i app dev and modification

There is a great deal of uncertainty about the future of IBM i, including mixed messages from IBM following the consolidation of Systems i and p into the new Power Systems brand. Although IBM may have weakened the public perception of the brand, the hardware and software still deliver what they always have—rock-solid reliability and dependable applications.

So although the world has forgotten about the "AS/400" and green screens, there are still huge code bases written over the past 10–40 years (RPG celebrated its 40th birthday in 2009) powering corporations of all sizes. The investment this technology represents can't simply be replaced with packaged ERP software or quickly rewritten in a new language or framework. The fact that these systems are still running is a testament to the success of the platform and its development ecosystem in general. This is a point that seems lost on the wider development and business community. There is simply no other system that supports—in their original form—applications written more than 40 years ago, without source-code modification.

The challenge for today's IBM i sites is how to retain sufficient development resources to maintain and develop the applications as the number of active RPG people diminish through promotions, retirement, and natural attrition. There has to be a way of enabling new people to understand quickly and accurately the complexities and subtleties of these sometimes vast systems and give them the confidence to make changes and extend these systems, even though they will never have developed anything like them themselves. This article is the first in a series that describes this growing challenge in some detail and also explains how new technologies and concepts are evolving to provide solutions and bolster IBM i development.

A typical application on IBM i could be anything from a few thousand to many millions of lines of code, with all the complexity, design inconsistencies, languages, syntaxes, and semantics that go with years of ongoing development. Mission-critical applications consist of a great many physical files or tables, and programs. The interdependencies of program-to-file and file-to-program alone can easily reach hundreds of thousands. We're not talking about the abstracted or esoteric nature of individual pieces of technology here, but entire business systems.

As with any successful management system, the key is information about your systems. The level of detail and availability of this information is another critical factor, which has already been proven in business by the success of Enterprise Resource Planning (ERP) and business systems in general. The requirement is not a new one but is becoming more universal as systems continue to grow and mature. A key question is how to manage the cost and risk of maintaining and modernizing these systems.

Let's examine how application mapping has become a core solution to the problem. Application mapping means analyzing and extracting a database of information about the resources that constitute a business application system.

Making Informed Decisions

Mapping an entire application provides a baseline of information for all sorts of metrics and analysis. Counting objects and source lines is generally the most common practice used for obtaining system-wide metrics. Many companies carry out software project estimations and budgeting using only this type of information. To some degree, the level of experience and technical knowledge of a manager and his staff might help in getting accurate numbers, but more often than not, it's mostly guesswork.

A slightly more advanced approach used with RPG or Cobol applications is to dig more deeply into the application and count design elements within the programs themselves. These elements include

  • files
  • displays
  • subfiles
  • source lines
  • subroutines
  • called programs
  • calling programs

By using a simple formula to allocate significance to the count of an element, you can categorize programs by their respective counts into low, medium, and high complexities. This type of matrix-based assessment, which Figure 1 shows, is still fairly crude but adds enough detail to make estimations and budgeting much more accurate without too much additional effort.

Another common practice is to take small representative samples, such as those selected for a proof-of-concept (POC), do project estimations, and then extrapolate this information in a simplistic linear way across the entire system or for an entire project. This method naturally relies upon the assumption that design, style, and syntax for the entire application are consistent with the samples used for the POC. The reality is that samples are most often selected for POCs based on functionality rather than complexity. Sometimes the opposite is true, whereby the most complex example is selected on the basis of "if it works for that, it'll work for anything."

Calculations that use comprehensive and accurate metrics data for an entire application, versus data from a sample, will exponentially improve the reliability of time and cost estimation. Risk is not entirely removed, but plans, estimates, and budgets can be more accurately quantified, audited, and even reused to measure performance of a project or process.

Some more advanced techniques to measure application complexity are worth mentioning. If such techniques are used over an application map, a number of very useful statistics and metrics can be calculated, including detailed testing requirements and a "maintainability index" for entire systems or parts thereof.

Building Application Maps

As application knowledge is lost and not replaced, the cost of ownership of these large, complex IBM i applications increases, and maintenance becomes more risky. The CL command Display Program References (DSPPGMREF) provides information about how a program object relates to other objects in the system. Figure 2 shows an example of DSPPGMREF's output. The information is useful in determining how a program relates to other objects. It is possible to extract this information and store it in a file, as Figure 3 shows, and then carry out searches on this file during analysis work.

A much more efficient way of presenting the same information, however, is to show it graphically. Additional information, such as the directional flow of data, can be added to diagrams easily. Systems design and architecture is best served using diagrams. Color coding within these constructs is also important because it helps people assimilate structure and logically significant information more quickly. A good example of using a diagram for more effective communication is to use it to show where program updates take place, for example, by using the color pink, as Figure 4 shows (SLMEN and CUSTS are the two updated tables in this program).

Embedding other important textual information such as an object's text into or along with diagrams is another way of presenting information effectively and efficiently. In Figure 4, you see how graphical and textual information combine to provide rich information about the program references. The diagram also uses arrows to show the flow of data between the program and the other objects.

Tom Demarco, the inventor of the data flow diagram concept, stated that what is critical is the flow of data through a system. Application mapping information can be extended, as Figure 5 shows, to simultaneously include details about individual variables associated with each of the referenced objects. In the case of a program-to-program relationship, the method used to extract this level of precise variable detail is to scan the source code of the programs and establish which entry parameters are used.

In a program-to-file relationship, the diagramming job is somewhat more tedious because you must look for instances in which database fields and corresponding variables are used throughout the entire program. Also useful is seeing where individual variables are updated as opposed to being used just as input. The diagram now presents a rich set of information in a simple and intuitive way. The amount of work to extract and present this level of detail in a diagram can quickly become prohibitive, so the task is therefore better suited to a tools-based approach rather than to manual extraction.

Figure 6 shows a program-centric diagram. The same diagram in which the file is the central object being referenced is also useful in understanding and analyzing complex applications. The same diagrammatic concepts can be used: color coding for updates, arrows for data flow, and simultaneous display of detailed variables. By using the same diagram types for different types of objects in this way, the same skills and methods can be reused to twice the effectiveness. Figure 5 shows how additional information, such as related logical files (displayed as database shapes), can be added and easily recognized by using different shapes to depict different object types. Application mapping and formal metric analysis were first attributed to Thomas J. McCabe Sr. in 1976 and Maurice Howard Halstead in 1977 (see "Calculating Complexity," below).

Functionally Organizing an Application

Single-level information about an RPG or Cobol program is obviously not enough to understand a business system's design. You need to be able to follow the logical flow downward through the application. You can use the DSPPGMREF output to do this. If you start at program A and see that it calls program B, you can then look at the DSPPGMREF information for program B, and so on. Additionally, you can deduce precisely in this structure where and how data, print, and display files are being used in the call stack, which is very useful for testing and finding bugs that produce erroneous data.

For large, complicated systems, this can be a slow and tedious process if done manually using the display output of DSPPGMREF. Extracting all programs' DSPPGMREF information out to a single file makes it possible to recursively query the file to follow the calls down successive levels, starting at a given program. This process can then show the entire call stack or structure chart for all levels, starting at a given program or entry point.

A given program's call stack or call structure can be represented much more effectively diagrammatically than with any textual description alone. Quite often, these call stacks may go down as many as 15 levels from a single starting point. Therefore, being able to display or hide details according to the information required at the time is important, along with having search facilities built in to the application map that supports the diagrams.

As with other diagrams, color coding plays an important role in classifying objects in the stack by their general use, such as update, display, input only, and so on. Figure 7 shows the structure of a program as seen graphically. Additional information, such as what data files, displays, and data areas are used by each object, can be added to enrich the information provided.

This diagram alone, however, doesn't tell you where you are in relation to the overall hierarchal structure of the application. You don't know whether the program is an entry point into the system or is buried in the lower levels of the application.

For better understanding of an entire system, therefore, objects need to be organized into functional groups or areas. This can be achieved by using naming conventions, provided that they exist and are consistent across the application. The entry points into the application need to be established. Sometimes a user menu system is useful for this but is not necessarily complete or concise enough. One way to establish what programs are potential entry points is to determine each program's call index. If a program isn't called anywhere but does call other programs, it can essentially be classed as an entry point into the system. If a program is called, and in turn if it calls other programs itself, it's not an entry point.

A functional area can be mapped by selecting an entry point (or a group of them) and then using the underlying application map to include all objects (everything, including programs, files, displays) in the call stack. Figure 8 shows a diagram of a series of entry points and their relative call stacks grouped as a functional area.

To more accurately describe an entire system's architecture, functional application areas might need to be grouped into other functional application areas. These hierarchal application areas can then be diagrammed, showing how they interrelate with each other. This interrelation can be hierarchal but also programmatic because some objects might be found in more than one application area simultaneously.

Figure 9 is a diagram showing how application areas interrelate. For the sake of clarity, the diagram includes only those programmatic interrelations from entry-level objects. The diagrams show how the accounting Main application area has other (e.g., B, A1) application areas embedded in it. The red lines show the programmatic links between objects within the application. In this example, the level of interrelation has been limited to programmatic links between entry-point programs and programs they call in other application areas. This is a good way of mapping business functional areas to application architecture in a simple diagram.

Logical subdivisions of an entire application are also being employed in other areas of application management. Some of these include

  • clear and concise allocation of responsibility for maintenance/support of a set of objects
  • integration with source change management tools for check-in and check-out processes during development
  • production of user documentation for support, training, and testing staff

Mapping Databases

An IBM i business application is primarily an application written over a relational database. Therefore, no map of an enterprise application would be complete without the database architecture explicitly specified—not just the physical specifications and attributes but the logical or relational constraints, too.

With the possible exception of CA 2E systems, virtually all RPG or Cobol applications running on IBM i have no explicit relational data model or schema defined. This means that millions of lines of RPG or Cobol code must be read in order to recover an explicit version of the relational model. What you need to know is what keys constitute these links or relationships between physical files or tables in the database.

The first task is to produce a key-map of all the primary keys and fields for all physical files, tables, logical files, access paths, and views in the database. By using a simple algorithm and looking at the DDS or DDL, you can often determine whether foreign-key relationships exist between files. Figure 10 shows a diagram of this simple algorithm using the database definitions themselves.

A more advanced and comprehensive approach for determining foreign key relationships is to analyze the program source code for the system. If you look at the source code of a program and see that more than one file/table is used, there's a possibility that these files are related by foreign key constraints. By finding instances in the program in which one of the files is accessed for any reason, and determining the keys used to do so, you can then trace these variables back through the code to keys in another file in the program. If at least one of the key fields match in attribute and size with the other file and is also part of the unique identifier of the file, you have a strong likelihood that there's a relationship between these two files. By then looking at the data using these key matches, you can test for the truth of the relationship. By cycling through all the files in the system one by one and testing for these matches with each and every other file, you can establish all the relationships.

This task is complicated generally by the fact that the same field in different files will usually have a different mnemonic name. When analyzing the program source, you'll have to deal with data structures, renames, prefixes, and multiple variables. If you have the program variable mapping information at your fingertips beforehand, the analysis process will be a lot quicker. The vast majority of this type of repetitive but structured analysis can be handled programmatically and thus enable completion of the task in a few hours rather than several months. Such automation naturally allows for keeping the relational model current at all times without huge overhead on resources.

Once explicitly defined, the relational model or architecture of the database can be reused in a number of scenarios, including

  • understanding application architecture
  • testing data quality for referential integrity
  • extracting test data
  • scrambling and aging test data
  • building BI applications a data warehouses
  • mapping data for system migrations
  • building object relational maps for modernization

Database access in all modern languages today is primarily driven by embedded SQL. IBM i legacy databases are typified by transaction-based table design with many columns and foreign key joins. This makes the task of writing SQL statements much more difficult and error prone unless the design of the database is clearly understood. It also creates an environment in which it's relatively easy for inexperienced developers or users to write I/O routines or reports that have an extremely negative performance impact. One way to combat this problem is to provide detailed design information about the database being accessed. Figure 11 shows a typical entity relationship diagram, and this can be accompanied with the underlying foreign key details, as Figure 12 shows.

Another, more generic approach to ensuring integrity of the database, guaranteeing productivity for modern technology developers, and limiting negative I/O performance impacts is to build a framework of I/O modules as stored procedures. The explicitly defined data model is a key source of information and will greatly simplify building of such a framework and can even be used to automate the generation of the framework itself.

It's also worth mentioning that products such as IBM's DB2 Web Query for i can become exponentially more useful and productive if the metadata layer is properly implemented. The derived data model can be used to build this data instantly for the entire system.

Hard-Coding Application Knowledge

The output of DSPPGMREF is a great starting point for the type of mapping I've described so far. To produce such details and abstractions, the application source code needs to be read and analyzed.

From a design perspective, application software is made up of discrete layers or levels of detail. In an IBM i application for example, libraries contain programs, physical files, logical files, data areas, commands, and many more object types, and programs might contain file specs, variables, subroutines, procedures, display definitions, arrays, and various other language constructs. Data files have fields and text descriptions and keys and other attributes. Having an inventory of all these elements is useful—but only in a limited way, from a management perspective. What's needed is context. For example, mapping what files and displays are specified in a program helps you understand at an object level the impact of change. This rudimentary mapping provided by most program comprehension tools is limited in its usefulness because it still provides information at only a single level.

Mapping all levels of detail and how they interrelate with all other elements at all levels is the ultimate objective. The only way to achieve this is to read the source code itself line-by-line and infer all relationships implicit in each statement or specification. Naturally, the mapping process must allow for variants of RPG, Cobol, and CL going back 20 years, if it is to be useful for the vast number of companies that have code written 20 years ago in their mix. Relatively few humans have such knowledge or skill and, as I've mentioned, few people could keep up with the workload required for even the most modest of IBM i applications. Computer programs can be "taught" such knowledge and retain it permanently. Such programs can also be reused as often as necessary to keep abreast of any code changes that occur.

Prebuilding the application map and storing it in an open and accessible format, such as a spreadsheet in Google Docs, is also an important aspect of the overall usefulness of such information. Figure 13 shows the output of a DSPGMREF uploaded into a Google Docs spreadsheet and being filtered. Having the map available provides for any number of complex, system-wide abstractions or inquiries at acceptable speeds.

For a complete and accurate application map, you have to follow the trail of inferred references described in the programs themselves. This is obviously a labor-intensive task made all the more difficult by common coding practices, such as

  • overriding the database field name in a CL program
  • prefixing fields from a file being used in an RPG program
  • moving values from database fields into program variables before passing them as parameters to called programs
  • changing key field names between different database files
  • passing the name of the program to be called as a parameter to a generic calling program rather than making a direct call

If the prebuilt application map includes all these inferred logical references, measurement of impact can be complete and, more important, instant. It also means that higher-level analysis of rules and model-type designs is easier by virtue of the easy availability of variable- and object-level mapping.

Moving Forward with Confidence

Application mapping provides a new way to manage and modernize complex business applications. It's also a way facilitate collaboration between modern and legacy developers. Think about what computerized mapping has done for navigational and guidance systems in our day-to-day lives and travels. Similarly, application mapping provides a strong platform for a number of benefits and technologies that will continue to evolve for many years. I'll discuss these subjects further in upcoming articles in this series.

Robert Cancilla spent the past four years as market manager for IBM's Rational Enterprise Modernization tools and compilers group for IBM i. Prior to that, Robert spent 34 years as an IT executive for three major insurance companies and an insurance software house. He has written four books on e-business for the AS/400 or iSeries and founded and operated the electronic user group IGNITe/400. Robert is currently retired and does independent consulting work.


Sidebar: Calculating Complexity

Halstead Complexity Metrics
Halstead complexity metrics were developed by the late Maurice Halstead as a means of determining a quantitative measure of complexity directly from the operators and operands in the module to measure a program module's complexity directly from source code. These metrics are among the earliest formal software metrics. They're strong indicators of code complexity based on the fact that they analyze actual source code. These metrics are most often used as maintenance metrics. They're one of the oldest measures of program complexity. See en.wikipedia.org/wiki/Halstead_complexity_measures for more information.

Cyclomatic Complexity
Cyclomatic complexity is a software metric (measurement) developed by Thomas McCabe. It measures the amount of decision logic in a single software module. Cyclomatic complexity is used for two related purposes. First, it gives the number of recommended tests for software. Second, it is used during all phases of the software life cycle, beginning with design, to keep software reliable, testable, and manageable. Cyclomatic complexity is based entirely on the structure of software's control flow graph. See en.wikipedia.org/wiki/Cyclomatic_complexity for more information.

—R.C.

ProVIP Sponsors

ProVIP Sponsors