Improve the Modifiability of Your Cobol Applications

Article ID: 57943

In the late 1950s, Cobol was designed by the people of CODASYL with the objective of building a programming language that had the following characteristics:

  • Based on English language (hence, easy to read)
  • Aimed toward business applications
  • Non-proprietary
  • Easy to maintain

General perception is that the first three objectives were fully achieved, but the fourth was not accomplished. Those of us who deal with legacy systems can attest that Cobol is not easy to maintain. Many Cobol applications were built as monolithic entities that resist change. In this article, I discuss some of the reasons that Cobol is so difficult to maintain and explain how we can improve its modifiability.

From Structured to Unstructured

A newly created Cobol program can be designed with special attention to structured programming rules. Nonetheless, business-critical Cobol code needs to be modified many times by many programmers. When these changes are carried out using different programming styles and under strict time pressure, often the original structure of the code is corrupted. Moreover, because the ultimate purpose of each change is either economic profit or an imperative legal requirement, programmers are under pressure to implement it at minimum cost in limited time. These circumstances can lead to a sloppy "copy and paste" approach that disregards the original structure of the code.

What's more, many programmers find the new functionality easier to implement if they circumvent the rules of structured programming. I illustrate this point with an example taken from real life: Transactions are received from a business partner in a flat file that contains a header record (identified by the transaction code "H") and one or more detail records (identified by the transaction code "D"). The header record contains information such as the transaction date, a record count, and several checksum fields.

Now, consider the following Cobol program section invoked by a PERFORM statement in this way PERFORM 01-FILE THRU FILE-EXIT.

01-file SECTION.

open-file.
   OPEN INPUT transactions.
   READ transactions
      INTO ws-header-area
         AT END CONTINUE.
   READ transactions
      AT END CONTINUE. 	
   
read-file.
   PERFORM UNTIL end-of-file
      ...
      READ transactions
         AT END CONTINUE
      END-READ
   END-PERFORM.

commit-changes.
   COMMIT.
	
close-file.
   CLOSE transactions.

file-exit.
   EXIT.

At the OPEN-FILE paragraph, the first READ statement gets the header record and locates it in a special area of the WORKING-STORAGE section, and the second READ statement gets the first detail record. At the READ-FILE paragraph, each detail record is processed until the end-of-file condition is detected. Then changes are committed, and the file is closed.

The code was originally written with an "optimistic" approach (no exception or error handling). But soon the need arose to add some exception handling and error reporting routines. Sometimes, the file was received from the business partner without the header record. When that happened, it was impossible to process the file. The maintenance programmer was asked to fix that problem. To separate the main functionality from the error handling, the programmer decided to place the new paragraph at the end of the section:

 01-file SECTION.

open-file.
   OPEN INPUT transactions.
   READ transactions
      INTO ws-header-area
         AT END CONTINUE.
   IF transaction-code NOT = "H"
      GO TO error-handling-paragraph.

   READ transactions
      AT END CONTINUE. 	
   
read-file.
   PERFORM UNTIL end-of-file
      ...
      READ transactions
         AT END CONTINUE
      END-READ
   END-PERFORM.

commit-changes.
   COMMIT.
	
close-file.
   CLOSE transactions.
   GO TO file-exit.

Error-handling-paragraph.
   ...

   GO TO close-file.

file-exit.
   EXIT.

The addition of a single paragraph made the code more complex, and now more time is needed to understand the new control-flow. Many times, a relatively small modification will involve adding several GO TO statements as a shortcut that falls into the category of "spaghetti code," and things end up changing from structured to unstructured code.

Coding Transaction-Processing Applications

Interactive Cobol applications can be viewed as event-driven programming. A display file record is written on the screen, and the program waits until the user does something with the keyboard, for example pressing any function key or choosing any option available in the display file record. Then the program branches to the appropriate module. Using terminology from the Model-View-Controller (MVC) pattern, you could say that in interactive programs, the Controller and the View are closely tied together, as Scott Klement points out in his article "Writing Reusable Service Programs". A common construct for the Controller in transactional Cobol programs is the antique GO TO DEPENDING ON statement:

    GO TO
       label1
       label2
       label3
       label4
       label5
       . . .
       DEPENDING ON kbd-entry 

In this example, the kbd-entry field contains a numeric value selected by the user from a menu. Note that the specified labels (i.e., label1, label2, etc.) do not contain automatic return. This style of programming inevitably leads to unstructured code.

The "Based on English Language" Paradox

On the academic side, it has been said that Cobol syntax is horrible because it was meant to be easy to read. The designers thought that English-like semantics would enable programmers without an engineering background to write application software, ignoring that the complexity of programming has little to do with syntax. Edsger Dijkstra discusses this in his paper "Two Views of Programming." Like the English language, Cobol sentences consist of a statement or a series of statements followed by a dot ("."). Earlier versions of the language didn't include explicit scope terminators, so programmers had to use the separator period to indicate the end of a sentence. Even today, many programmers do not use explicit scope terminators. Now, consider the following piece of code that was necessary to amend during the Y2K preparation:

IF ws-year < 30
   MOVE 20 TO ws-century
ELSE
   MOVE 19 TO ws-century
MOVE WS-DATE TO maturity-date
WRITE invoice-record.

The last two lines were the original code; the first four lines were added by the maintenance programmer. The purpose was to provide a temporary fix for the Y2K problem using a windowing technique: years entered as 00-29 are assumed to represent 2000 through 2029; and years entered as 30-99 represent 1930 through 1999. The modified program ran successfully until Dec, 31, 1999. However, as soon as WS-YEAR is less than 30, the original code is no longer evaluated. The problem here is that the maintainer forgot to end the conditional statement with a dot, and now the dot at the end of the last line delimits the scope of the IF statement.

Using a separator period as a scope terminator wouldn't have solved the problem in all cases. Suppose that the original code fragment was

IF ws-date > current-date
   MOVE WS-DATE TO maturity-date
   WRITE invoice-record.

and the programmer added the code for the temporary fix but this time remembered the separator period:

IF ws-date > current-date
   IF ws-year < 30
      MOVE 20 TO ws-century
   ELSE
      MOVE 19 TO ws-century.
   MOVE WS-DATE TO maturity-date
   WRITE invoice-record.

What happens now? The original code was put outside the scope of the IF statement, and it was transformed from a conditional to an imperative statement! The "based on English language" approach has caused a lot of unnecessary issues, which would have been avoided with the use of an explicit scope terminator:

IF ws-year < 30
   MOVE 20 TO ws-century
ELSE
   MOVE 19 TO ws-century
END-IF

New versions of Cobol include a lot of explicit scope terminators (e.g., END-IF, END-COMPUTE, END-CALL). Unfortunately, many Cobol programmers aren't yet using them as part of their programming style.

I believe it's unnecessary to provide further examples, because surely you have your own experience in the field of dealing with legacy code. Although academia doesn't appreciate Cobol, there's no such thing as "industry versus academia." Therefore let's look to academia to revitalize the maintainability of our legacy code.

The Contribution of Sellink, Sneed, and Verhoef

In 1998, A. Sellink, H. Sneed, and C. Verhoef, three researchers at the University of Amsterdam, developed an automatic restructuring tool and published the paper "Restructuring of Cobol/CICS Legacy Systems" (links to a paid-membership-only full version of the paper, as well as a free abridged version of the paper, are at the end of this article). I found two interesting characteristics in this paper: (1) it is not written in the typical "academic style," and (2) it contains a series of useful recommendations that are completely applicable to our day-to-day work. The researchers postulate that before putting a "web face" on our applications, the applications need to be restructured. "One can say that a prerequisite to large-scale renovation is major restructuring" they say, because "restructuring significantly improves maintenance and major enhancements." Next, I discuss some of the paper's recommendations that are absolutely valid today.

Transform Subroutines into Well-Structured Procedural Representations

To transform subroutines into well-structured procedural representations, it's necessary to put the Controller in a coordination section and the Model (i.e., the PERFORMed subroutines) in a subroutines section. The coordination section operates as the "main" function, in the sense that it's the entry point of the program and the logical interface between the modules. PERFORMed subroutines reside in the subroutines section, and they behave as subprocedures. We will use our first example code fragment to illustrate the transformation process:

main SECTION.

main-paragraph.
   PERFORM open-file
   IF valid-header
 PERFORM read-file
 PERFORM commit-changes
   END-IF
   PEFORM close-file.
main-exit.
   EXIT.

bar SECTION.
bar-paragraph.
   GOBACK.

subroutines SECTION.

open-file.
...
   IF transaction-code = "H"
 SET valid-header TO TRUE
   ELSE
 SET invalid-header TO TRUE
   END-IF
...

read-file.
...

commit-changes.
...

error-handling.
...

Note that a Boolean variable is set at the OPEN-FILE subroutine to indicate the existence of the header record. Its value is checked at the main section after control-flow returns to the continuation point of the invoking PERFORM statement. Therefore, the flag acts as a return value in a subprocedure. A special section has been added between the main section and the subprocedures section: the BAR section, which has only one statement: GOBACK. The program ends when control-flow reaches this section.

With this approach, we can easily add new subroutines into the code, without losing the original structure.

Eliminate Unnecessary Dots and Add Explicit Scope Terminators

Although at first glance it may seem unnecessary, always use explicit scope terminators. It avoids a lot of issues that can appear when the code is changed, and it improves the code's legibility. The following code fragment doesn't use explicit scope terminators:

IF BALANCE > 0
   IF BALANCE > 100
      COMPUTE FINANCE-CHARGE = BALANCE * 0.015
      ADD FINANCE-CHARGE TO BALANCE
   ELSE
      MOVE 1.50 TO FINANCE-CHARGE
      ADD FINANCE-CHARGE TO BALANCE
ELSE
   MOVE 0 TO FINANCE-CHARGE.

Note that the sentence ADD FINANCE-CHARGE TO BALANCE occurs in both branches of the IF statement. It's inevitable if we don't use explicit scope terminators in nested IFs. Now, see the equivalent code with explicit scope terminators:

IF BALANCE > 0
   IF BALANCE > 100
      COMPUTE FINANCE-CHARGE = BALANCE * 0.015
   ELSE
      MOVE 1.50 TO FINANCE-CHARGE
   END-IF
   ADD FINANCE-CHARGE TO BALANCE
ELSE
   MOVE 0 TO FINANCE-CHARGE
END-IF

Eliminate the GO TO Statement

Substitute the GO TO constructs with while-loop patterns. Sometimes, it's necessary to provide additional boolean variables to the transformed code, but it's a small price to pay because there is much to gain in maintainability. In transaction processing applications, the GO TO DEPENDING ON statement can be substituted with the EVALUATE statement (the Cobol equivalent for the CASE pattern).

And finally, the last recommendation is more about a discipline rather than a specific technique:

Separate the Transaction Processing Logic From the Business Logic

This is in-line with the preaching we've been hearing from System iNews columnists for many years. They have provided so many arguments, that it amazes me when I code that contains business rules and database processing along with display records, indicators, message subfile issues, and so forth. Here, I will simply emphasize one of the multiple arguments in support of this approach: it lets you put the business rules into service programs, and create a web user interface for it. Scott Klement's article cited above contains a RPG perspective, which is also applicable to a Cobol environment.

More Information

The paper "Restructuring of COBOL/CICS Legacy Systems" is available at the Association of Computer Machinery (ACM) paid website.

A free, condensed version of the paper is also available.

Most of the "errors" exampled in the beginning of the article have nothing to do with the programming language, IMHO. It's programmers not taking their time to do the job right.

ProVIP Sponsors

ProVIP Sponsors