In the late 1950s, Cobol was designed by the people of CODASYL with the objective of building a programming language that had the following characteristics:
General perception is that the first three objectives were fully achieved, but the fourth was not accomplished. Those of us who deal with legacy systems can attest that Cobol is not easy to maintain. Many Cobol applications were built as monolithic entities that resist change. In this article, I discuss some of the reasons that Cobol is so difficult to maintain and explain how we can improve its modifiability.
A newly created Cobol program can be designed with special attention to structured programming rules. Nonetheless, business-critical Cobol code needs to be modified many times by many programmers. When these changes are carried out using different programming styles and under strict time pressure, often the original structure of the code is corrupted. Moreover, because the ultimate purpose of each change is either economic profit or an imperative legal requirement, programmers are under pressure to implement it at minimum cost in limited time. These circumstances can lead to a sloppy "copy and paste" approach that disregards the original structure of the code.
What's more, many programmers find the new functionality easier to implement if they circumvent the rules of structured programming. I illustrate this point with an example taken from real life: Transactions are received from a business partner in a flat file that contains a header record (identified by the transaction code "H") and one or more detail records (identified by the transaction code "D"). The header record contains information such as the transaction date, a record count, and several checksum fields.
Now, consider the following Cobol program section invoked by a PERFORM statement in this way PERFORM 01-FILE THRU FILE-EXIT.
01-file SECTION.
open-file.
OPEN INPUT transactions.
READ transactions
INTO ws-header-area
AT END CONTINUE.
READ transactions
AT END CONTINUE.
read-file.
PERFORM UNTIL end-of-file
...
READ transactions
AT END CONTINUE
END-READ
END-PERFORM.
commit-changes.
COMMIT.
close-file.
CLOSE transactions.
file-exit.
EXIT.
At the OPEN-FILE paragraph, the first READ statement gets the header record and locates it in a special area of the WORKING-STORAGE section, and the second READ statement gets the first detail record. At the READ-FILE paragraph, each detail record is processed until the end-of-file condition is detected. Then changes are committed, and the file is closed.
The code was originally written with an "optimistic" approach (no exception or error handling). But soon the need arose to add some exception handling and error reporting routines. Sometimes, the file was received from the business partner without the header record. When that happened, it was impossible to process the file. The maintenance programmer was asked to fix that problem. To separate the main functionality from the error handling, the programmer decided to place the new paragraph at the end of the section:
01-file SECTION.
open-file.
OPEN INPUT transactions.
READ transactions
INTO ws-header-area
AT END CONTINUE.
IF transaction-code NOT = "H"
GO TO error-handling-paragraph.
READ transactions
AT END CONTINUE.
read-file.
PERFORM UNTIL end-of-file
...
READ transactions
AT END CONTINUE
END-READ
END-PERFORM.
commit-changes.
COMMIT.
close-file.
CLOSE transactions.
GO TO file-exit.
Error-handling-paragraph.
...
GO TO close-file.
file-exit.
EXIT.
The addition of a single paragraph made the code more complex, and now more time is needed to understand the new control-flow. Many times, a relatively small modification will involve adding several GO TO statements as a shortcut that falls into the category of "spaghetti code," and things end up changing from structured to unstructured code.
Interactive Cobol applications can be viewed as event-driven programming. A display file record is written on the screen, and the program waits until the user does something with the keyboard, for example pressing any function key or choosing any option available in the display file record. Then the program branches to the appropriate module. Using terminology from the Model-View-Controller (MVC) pattern, you could say that in interactive programs, the Controller and the View are closely tied together, as Scott Klement points out in his article "Writing Reusable Service Programs". A common construct for the Controller in transactional Cobol programs is the antique GO TO DEPENDING ON statement:
GO TO
label1
label2
label3
label4
label5
. . .
DEPENDING ON kbd-entry
In this example, the kbd-entry field contains a numeric value selected by the user from a menu. Note that the specified labels (i.e., label1, label2, etc.) do not contain automatic return. This style of programming inevitably leads to unstructured code.
On the academic side, it has been said that Cobol syntax is horrible because it was meant to be easy to read. The designers thought that English-like semantics would enable programmers without an engineering background to write application software, ignoring that the complexity of programming has little to do with syntax. Edsger Dijkstra discusses this in his paper "Two Views of Programming." Like the English language, Cobol sentences consist of a statement or a series of statements followed by a dot ("."). Earlier versions of the language didn't include explicit scope terminators, so programmers had to use the separator period to indicate the end of a sentence. Even today, many programmers do not use explicit scope terminators. Now, consider the following piece of code that was necessary to amend during the Y2K preparation:
IF ws-year < 30 MOVE 20 TO ws-century ELSE MOVE 19 TO ws-century MOVE WS-DATE TO maturity-date WRITE invoice-record.
The last two lines were the original code; the first four lines were added by the maintenance programmer. The purpose was to provide a temporary fix for the Y2K problem using a windowing technique: years entered as 00-29 are assumed to represent 2000 through 2029; and years entered as 30-99 represent 1930 through 1999. The modified program ran successfully until Dec, 31, 1999. However, as soon as WS-YEAR is less than 30, the original code is no longer evaluated. The problem here is that the maintainer forgot to end the conditional statement with a dot, and now the dot at the end of the last line delimits the scope of the IF statement.
Using a separator period as a scope terminator wouldn't have solved the problem in all cases. Suppose that the original code fragment was
IF ws-date > current-date MOVE WS-DATE TO maturity-date WRITE invoice-record.
and the programmer added the code for the temporary fix but this time remembered the separator period:
IF ws-date > current-date
IF ws-year < 30
MOVE 20 TO ws-century
ELSE
MOVE 19 TO ws-century.
MOVE WS-DATE TO maturity-date
WRITE invoice-record.
What happens now? The original code was put outside the scope of the IF statement, and it was transformed from a conditional to an imperative statement! The "based on English language" approach has caused a lot of unnecessary issues, which would have been avoided with the use of an explicit scope terminator:
IF ws-year < 30 MOVE 20 TO ws-century ELSE MOVE 19 TO ws-century END-IF
New versions of Cobol include a lot of explicit scope terminators (e.g., END-IF, END-COMPUTE, END-CALL). Unfortunately, many Cobol programmers aren't yet using them as part of their programming style.
I believe it's unnecessary to provide further examples, because surely you have your own experience in the field of dealing with legacy code. Although academia doesn't appreciate Cobol, there's no such thing as "industry versus academia." Therefore let's look to academia to revitalize the maintainability of our legacy code.
In 1998, A. Sellink, H. Sneed, and C. Verhoef, three researchers at the University of Amsterdam, developed an automatic restructuring tool and published the paper "Restructuring of Cobol/CICS Legacy Systems" (links to a paid-membership-only full version of the paper, as well as a free abridged version of the paper, are at the end of this article). I found two interesting characteristics in this paper: (1) it is not written in the typical "academic style," and (2) it contains a series of useful recommendations that are completely applicable to our day-to-day work. The researchers postulate that before putting a "web face" on our applications, the applications need to be restructured. "One can say that a prerequisite to large-scale renovation is major restructuring" they say, because "restructuring significantly improves maintenance and major enhancements." Next, I discuss some of the paper's recommendations that are absolutely valid today.
To transform subroutines into well-structured procedural representations, it's necessary to put the Controller in a coordination section and the Model (i.e., the PERFORMed subroutines) in a subroutines section. The coordination section operates as the "main" function, in the sense that it's the entry point of the program and the logical interface between the modules. PERFORMed subroutines reside in the subroutines section, and they behave as subprocedures. We will use our first example code fragment to illustrate the transformation process:
main SECTION. main-paragraph. PERFORM open-file IF valid-header PERFORM read-file PERFORM commit-changes END-IF PEFORM close-file. main-exit. EXIT. bar SECTION. bar-paragraph. GOBACK. subroutines SECTION. open-file. ... IF transaction-code = "H" SET valid-header TO TRUE ELSE SET invalid-header TO TRUE END-IF ... read-file. ... commit-changes. ... error-handling. ...
Note that a Boolean variable is set at the OPEN-FILE subroutine to indicate the existence of the header record. Its value is checked at the main section after control-flow returns to the continuation point of the invoking PERFORM statement. Therefore, the flag acts as a return value in a subprocedure. A special section has been added between the main section and the subprocedures section: the BAR section, which has only one statement: GOBACK. The program ends when control-flow reaches this section.
With this approach, we can easily add new subroutines into the code, without losing the original structure.
Although at first glance it may seem unnecessary, always use explicit scope terminators. It avoids a lot of issues that can appear when the code is changed, and it improves the code's legibility. The following code fragment doesn't use explicit scope terminators:
IF BALANCE > 0
IF BALANCE > 100
COMPUTE FINANCE-CHARGE = BALANCE * 0.015
ADD FINANCE-CHARGE TO BALANCE
ELSE
MOVE 1.50 TO FINANCE-CHARGE
ADD FINANCE-CHARGE TO BALANCE
ELSE
MOVE 0 TO FINANCE-CHARGE.
Note that the sentence ADD FINANCE-CHARGE TO BALANCE occurs in both branches of the IF statement. It's inevitable if we don't use explicit scope terminators in nested IFs. Now, see the equivalent code with explicit scope terminators:
IF BALANCE > 0
IF BALANCE > 100
COMPUTE FINANCE-CHARGE = BALANCE * 0.015
ELSE
MOVE 1.50 TO FINANCE-CHARGE
END-IF
ADD FINANCE-CHARGE TO BALANCE
ELSE
MOVE 0 TO FINANCE-CHARGE
END-IF
Substitute the GO TO constructs with while-loop patterns. Sometimes, it's necessary to provide additional boolean variables to the transformed code, but it's a small price to pay because there is much to gain in maintainability. In transaction processing applications, the GO TO DEPENDING ON statement can be substituted with the EVALUATE statement (the Cobol equivalent for the CASE pattern).
And finally, the last recommendation is more about a discipline rather than a specific technique:
This is in-line with the preaching we've been hearing from System iNews columnists for many years. They have provided so many arguments, that it amazes me when I code that contains business rules and database processing along with display records, indicators, message subfile issues, and so forth. Here, I will simply emphasize one of the multiple arguments in support of this approach: it lets you put the business rules into service programs, and create a web user interface for it. Scott Klement's article cited above contains a RPG perspective, which is also applicable to a Cobol environment.
A free, condensed version of the paper is also available.