Deadly sins or just bad habits? Either way, there are 11 you need to watch out for…
Let's start at the beginning. Before you can implement a sound application database and applications on top of it, you need a very clear model of the elements of interest in the application domain and how they're related to one another. Logical data modeling identifies the following elements:
There are a variety of methods and notations for creating a data model. Use cases, which describe interactions among actors and systems, are a widely used means of discovering elements that need to be part of the model. Entity-Relationship Diagrams (ERDs) and Unified Modeling Language (UML) are two notations for describing models.
I won't even suggest that any well-run IT organization would commit the cardinal sin of implementing a database and applications without doing data modeling as part of the development process. The "deadly sin" I highlight here is that the distinction between the modeling process and designing the physical database (e.g., tables and views) is often blurred.
A common mistake, for example, is to insist that physical database tables be "normalized" to some degree or other (the rule-of-thumb is often to third normal form) so "facts" are not redundantly stored in the database. Imposing particular normal forms on physical tables both misses the point and can lead to a database that's harder to use and/or performs worse than a "non-normalized" database.
Normal forms serve a purpose in logical data modeling by helping understand and depict the facts that need to be tracked. Take the simple example of orders, items, and prices. When describing the logical model, you would normally have at least the following four tables:
The ItemPrice table would have one or more entries for each item storing the price and beginning (and optionally, ending) date(s) of the period(s) for each different price.
Now, should Price also be part of the OrderItem table in the logical data model? The answer depends on the facts you need to track. When the logical data model does not include Price in the OrderItem table, that's a statement that the price of an item is always the same for all orders on the same day and can therefore be determined by using ItemID and OrderDate to look up the price in the ItemPrice table. On the other hand, including Price in the data model's OrderItem table is a statement the price may vary (e.g., from a negotiated reduction).
Now, let's consider possible physical database designs. Should the physical OrderItem database table contain a Price column? If the logical data model OrderItem table contains this column, thus indicating prices may vary among orders on the same day, the answer is obviously "Yes."
But, even if the logical OrderItem table doesn't contain the price column, we may reasonably decide to redundantly store the item's price when the order was placed to improve performance on retrieval operations. As long as we note this reason in documenting the physical design and implement appropriate code to propagate changes to the ItemPrice table, there is no harm and no "sin" has been committed. (Anytime facts are stored redundantly, applications generally must be sure to update all instances of the fact when any one instance is changed.)
A "deadly sin" can occur in two ways, however: One, a logical data model can be rendered inaccurate when physical design decisions (e.g., redundantly storing facts for performance reasons) are expressed in the logical data model. Two, the physical implementation may not be as convenient or efficient if implementation is a one-for-one mapping of tables in the logical data model to physical database tables.
Many businesses use identifiers that contain semantic information, such as properties of the item that's identified. For example, the first two digits of a bank account number may contain a code for the type of account, e.g., savings, checking, etc. This is convenient for humans who use the account, but can create difficult-to-unwind problems when such identifiers are used as the primary key for rows in the database.
The problem arises because an entity instance's primary key value is stored wherever a reference to the instance is required. Thus, the "fact" of an account's type may be redundantly stored in many places, including offline archives. If the coding system changes or the type of an account changes (for example, by the bank instituting new types of accounts to which old accounts are automatically converted), then updating the "facts" may be complex or impossible.
To avoid this "deadly sin," use meaningless identifiers for primary keys. This doesn't mean users have to switch what they use to identify accounts and other entities, but within the database, primary and foreign key values should be meaningless and never need to be changed.
Your enterprise should be following industry practices and using SQL to define and access IBM i databases. SQL works across all major databases, and IBM is investing all their development resources for the IBM i database to SQL functions, tools, and performance improvements. This doesn't mean you have to immediately rip out all the RPG and Cobol I/O from legacy programs that otherwise don't require much change. For any new development or significant changes to the database or applications, however, SQL is the saintly choice.
The IBM i has had the industry's most sophisticated foundation for database security since the original System/38 introduced object-based authorities. And yet, I still find many IT organizations that don't understand or implement sound security practices for their databases.
Often, the public is granted full authority to database tables, and so-called "menu-based security" or other home-grown techniques are relied upon to control access. These approaches are about as effective as the Pharaoh's attempts to intimidate Moses.
As a general rule, the public should be granted no authority to any database tables except those that contain information that everyone in the world is allowed to view.
To provide access to specific, authorized user profiles, there are several sound techniques. In most cases, it's better to grant object authorities to group profiles and make each individual user profile a member of the appropriate group(s). Whether granting authorities to an individual or user profile, the First Commandment is: "Thou shalt not grant greater authority than the profile should have through any database interface." In other words, don't grant update authority if you want to be sure the profile can only perform limited types of updates.
Granting data rights to one or more view(s) over a database table, but not to the table itself, provides one good option to limit access to specific columns. Column-level security is another option, but managing many different column combinations can be burdensome.
The most underappreciated IBM i database access feature is program-adopted authority. If you're not familiar with this feature or not using it, get thee forth to the temple! With program-adopted authority, you can limit access by finer-grained functions than Fetch (Read), Insert, Delete, and Update. The technique is simple. First, follow the rules above about limiting authorities. Then, to provide access to the data only via a particular function:
When a user who has authority to the program runs it, the user will be able to execute the program's functions and will be allowed the necessary access to the underlying database objects because of the adopted authorities. This powerful, flexible IBM i feature is a great way to provide secure access to remote applications through stored procedures.
SQL for IBM i provides many different choices for column data types. This can lead to a tendency to customize each column with the "perfect" data type and size. In turn, this can result in numerous conversion problems in programs that access the table and can complicate writing SQL functions and stored procedures that operate on values from columns with different types.
Instead, either use an automated tool that supports defining types and generating SQL Create Table statements or use a "column reference table." A column reference table is just a dummy table with columns that serve as type definitions. No data is stored in the table, but the columns in it can be referenced with an "as-subquery" clause in the Create Table statements for other tables. (If you're familiar with the DDS concept of a field reference file, this is the same idea.)
Column reference tables are a real boon to simplification and consistency. With the low cost of auxiliary storage, there's no longer much need to worry about allocating a few unneeded bytes for some columns, so a small number of column types and sizes is usually all you'll need.
One of the temptations when developers first learn SQL is to exploit SQL's cool "null" feature. A table column defined without the Not Null clause allows the column to be set to null in a particular row. Null isn't a value; it's a flag that the column contains no value for the current row. SQL data manipulation statements have various implicit and explicit options for handling null columns. This would sound like a feature that just adds a useful ability to flag when data is absent, for example, an OrderShipDate column prior to the order actually shipping.
But, in the end, null-capable columns can be a nuisance to deal with and may create data selection or calculation errors that go undetected. A couple rules will help avoid trouble:
Even if you follow these rules, be sure you account for SQL's implicit handling of nulls in queries you write over columns that are null-capable or when using expressions that can produce a null result.
The structural meaning and integrity of a relational database depends on using primary and foreign key values to store relationships. For example, specific orders are related to a customer by storing a CustomerID value in the associated rows of the Order table.
By defining relationships among tables, based on SQL specifications for tables' primary and foreign keys, you can have the database system assure the data's referential integrity is maintained. Otherwise, you must make sure that all applications that update the database reliably maintain referential integrity, for example, by not deleting a "parent" row unless the database contains no dependent rows that reference the parent row's primary key.
Enforcing referential integrity by application coding is more work and less reliable than using the database system's built-in capability. Unfortunately, improperly written legacy applications may fail when database-enforced referential integrity is enabled. If you have older applications that create this problem, you may have to do a bit of penance by replacing the code with code that works properly with system-enforced referential integrity.
Database consistency requires that any transaction involving multiple row updates either complete all of the updates or none of the updates. The classic example is that a banking transaction to transfer an amount from a savings account to a checking account must complete updates to both the checking account and savings account records, or neither of these records should be changed.
Put simply, it's impossible to implement full transaction integrity with application code. Fortunately, it's relatively easy and completely reliable to use the IBM i database's "commitment control" feature to assure transaction integrity.
So why do so many "sinners" in the IBM i IT community still not use commitment control (in SQL terms, the transaction isolation level)? Not using commitment control baffles me. This is one case where the wages of sin could be death to your database, your enterprise, or your career. . . . Maybe that's a bit extreme, but running IBM i applications without system-provided transaction integrity is certainly not the way to get through the pearly gates to IT heaven.
Trigger programs are another IBM i database feature that can be "wicked" in either sense of the word. Trigger programs are wicked-cool in all the things you can do by having a program that runs anytime an insert, update, or delete occurs on a database table (or view), regardless of which database interface the operation occurs through. But they can also be wicked-wicked in that unreliable trigger program can cause many problems to unsuspecting applications.
The Golden Rule is: Trigger programs should do unto other applications as you would have other trigger programs do unto your applications.
Above all else, that means your trigger programs must have exception handling that makes them bulletproof. While it may be appropriate for a trigger program that encounters an unexpected condition to cause the current database operation to fail, the trigger program should do so by completing in an orderly fashion and setting an appropriate SQL State.
The SQL runtime intercepts all database exceptions and sets the SQL State value to an appropriate value that categorizes and identifies the exception. Applications that fail to check SQL State after an operation can continue running "normally" after an exception leading to invalid or incomplete results and errors that manifest themselves many instructions beyond where the exception arose. There is a special place in that "warm spot" for the departed where programmers who commit this sin are sent.
Checking SQL State is one of those rules that must be observed without exception. The worst form of this sin is to check SQL State after most SQL statements, but then omit the check for a statement the programmer "knows" couldn't possibly cause an exception. When the unlikely exception does occur, it may be after the program has run properly for years. The poor soul assigned to diagnose the problem may waste an eternity tracking down the cause.
Even sharp, experienced database designers and programmers can't always determine the best design for database objects or the most efficient way to access them in an application just by applying "best practices." In many cases, the only way to deliver top-performing applications is to analyze execution results in a production environment or one similar to it. The developer who ignores this fact will bring a pox upon the enterprise's database performance.
There are two requirements for delivering great database performance:
Database designers and programmers who are familiar with SQL performance concepts and tools can burnish their IT halos by delivering applications that maximize database performance.
Take it from a true believer, your life will be much easier if you avoid the "deadly database sins" I've described—though I can't guarantee you'll reach the database Promised Land. But, at least you'll avoid spending too much time in that other place.
Paul Conte is a System iNEWS senior technical editor.