ACME Enterprises International, which employs thousands of people worldwide, has suffered a security breach within the organization. A disgruntled individual has stolen a list of employee salary information and sent it to Big Brother News (BBN). To make matters worse, the salary information discloses that several top executives received substantial bonuseseven though ACME reported a loss in earnings.
Several years ago the ACME Database Engineer (DBE) approached management with a plan to modernize the existing database based on the IBM Data Access Modernization strategy documented in the IBM Redbook “Modernizing IBM eServer iSeries Application Data Access a Road-map Cornerstone.” Management refused, citing that age-old business maxim “If it ain’t broke don’t fix it.” Today ACME management summoned the DBE and told him “The system is broke; fix it and fix it fast.
The above scenario suggests one of many possible business reasons for modernizing IT, albeit ACME would rather have been modernizing for more positive reasons (e.g., business growth). In today’s economy many organizations face tough decisions and are forced to make significant cuts in operating budgets. Now is the perfect time to enhance employee skills via a database reengineering project. Keeping employees together and investing in research and development was the approach former IBM CEO Thomas Watson Sr. took during the extreme depression of the 1930’s with excellent results. You can read more about this in the book "The Maverick and His Machine: Thomas Watson Sr. and the Making of IBM.”
The primary goal for any database reengineering project is to minimize the impact of change on the business. The IBM Database Modernization strategy was developed with that goal in mind. Thus the strategy is implemented in stages and within each stage are multiple phases. Phases can be further broken down by steps or tasks. These steps and tasks need not be done all at once.
In addition, the strategy is based on IBM’s DB2 for i SQL Optimization best practices. This means that many of the techniques recommended for data access modernization are optimal for DB2 for i SQL performance and scalability.
The main stages of the IBM Database Modernization Strategy are
You can read about stages 1 and 2 in previous articles: “Performance Comparison of DDS-Defined Files and SQL-Defined Files” (May 2005) and “Understanding Access Plans and Open Data Paths” (December 2005), respectively. This article focuses on Stages 3, 4 and 5 and how you can apply them to correct, protect, and enhance an existing database. ACME, during stages 1 and 2, settled on the IBM Rational suite of tools for its Integrated Development Environment (IDE). This article contains several screen shots from the IBM Rational Data Architect (RDA) product.
The DB2 Sample Schema (which is shipped as part of the System i OS) is used as a basis for the reengineering examples contained within this article. You can create this schema by executing the stored procedure CREATE_SQL_SAMPLE from the System i Start SQL Interactive Session (STRSQL) interface or from within an SQL scripting tool such as System i Navigator Run SQL scripts.
The following is an example of calling the procedure (note: replace SCHEMA-NAME with a name of your choosing; for example DB2SANDBOX):
CALL QSYS.CREATE_SQL_SAMPLE('SCHEMA-NAME');
Although the database modernization strategy was originally based on non-SQL database modernization, you can apply the methods discussed in this article to any database reengineering project. .
Two major flaws within the employee database design led to the security breach at ACME. The first was the use of the employee number column (EMPNO) as both a unique and primary key. (You can read more about primary and unique differences in the Centerfield Technology February-March 2009 newsletter at http://www.centerfieldtechnology.com/. This oversight caused the values contained within the EMPNO column to be propagated as foreign keys throughout the database. The second design flaw was combining both biographical information and payroll information within a single data store, in this case the EMPLOYEE table. This became apparent to the DBE as the employee database was analyzed using modern design and development tools. Figure 1 contains a screenshot from the RDA Physical Data Model (PDM) diagram for the employee database. Fortunately for the DBE the database fields used descriptive names, which aided in quickly identifying the design exposures.
In Figure 1, you can see that several views exist that do not contain the payroll information (VEMP, VEMPLP, VEMPDPT1). This is a good first attempt at protecting sensitive data from enquiring minds. In addition, the DBE was solely responsible for creating the views and providing the required permissions; another good practice. Unfortunately, this did not prevent sensitive data from being copied from the production tables to test versions of the tables. To correct the design flaws, the DBE proposed a reengineering project for the existing employee database.
The Wikipedia dictionary defines reengineering as "the application of technology and management science to the modification of existing systems, organizations, processes and products in order to make them more effective, efficient and responsive." In other words, a successful reengineering project requires the right tools and a good plan. The following are some of the tasks that are part of the Employee database reengineering plan put in place by the ACME DBE. Utilize modern development tools to complete the following tasks:
The Employee_Bio table contains only the columns considered non-sensitive or non-volatile. The sensitive, volatile data columns have been moved to the Employee_Pay table. The issue of volatility can play a major factor when changes are made to tables being logged (journaled when using DB2 i). In essence, if entire row images are logged then sensitive data is logged when an employee phone number changes. Or the entire biography image is logged when pay information changes. In addition, performance indexes supporting aggregate functions (e.g., MIN, MAX) over payroll columns need to be maintained when inserting new employees or deleting inactive employees. This is a major concern for ACME’s manufacturing and retail divisions where employment is seasonal and employee turnover is high.
ACME has adopted the following rules for all new or reengineered tables:
To satisfy the above requirements the DBE created a new column (EMP_ID) defined as LONG in the LDM (this will be transformed to a BIGINT for DB2 i). In addition, it is designated as the Primary Key (primary key constraint), surrogate (AS IDENTITY) and the values are derived (auto-generated) using default values (Figure 3).
The row change timestamp columns are added as part of the LDM design. The 6.1 IMPLICITLY HIDDEN and auto-generation timestamp capabilities will be added as part of the PDM deployment.
The EMP_ID field is used to create a relationship between the Employee_Bio and the Employee_Pay tables. In essence, EMP_ID becomes a foreign key. Although EMPNO is currently used in existing relationships, it is considered by ACME to be sensitive and thus cannot be propagated across tables. The EMP_ID field will be used to establish primary-foreign key relationships from this point forward.
The cardinality between Employee_Bio and Employee_Pay is designated as Zero or One. This means an Employee_Bio record can exist without an Employee_Pay record. An added benefit to ACME is that the company can produce a list of active employees on the payroll (i.e., employees with a pay record) by joining the Bio and Pay tables together using the EMP_ID column. This eliminates the need for a “status” flag in the employee tables designating active or inactive (more on this during the INSTEAD OF trigger discussion). The relationship properties are shown in Figure 4.
Once the LDM design is complete, reviewed and approved, the DBE transforms the LDM into a PDM based on the target RDBMS (DB2 i in this case). This is as easy as 1-2-3. First, choose Transform->Physical Data Model from the RDA menu bar. Second, select Create new model from the Transform To Physical Data Model pop-up window. Third, choose the target database (DB2 i of course), press Next and take defaults on all remaining panels (Figure 5).
Before deploying the new tables the DBE performs some additional fine tuning to the generated DDL script. In essence, the current version of RDA used by the ACME DBE does not support the new 6.1 IMPLICITLY HIDDEN clause or the auto-generated row change timestamp capabilities. The DBE will overcome this limitation by using the Data->Generate DDL option from the PDM menu bar and then modifying the DDL script using Snippets.
In essence, Snippets provide the capability of creating reusable code templates that you can insert into existing source code. In addition, the templates can contain replacement variables, allowing the inserted code to be customized (if needed) for each use (Figure 6).
The SQL full-select statement within the CREATE VIEW statement uses the concept of SELECT table-correlation-name.* to reference all of the columns from the base tables. Prior to 6.1 this statement would return all columns from each base table; not a good practice. The IMPLICITLY HIDDEN clause (new in 6.1) allows the DBE to use a quick and dirty approach to creating the view. In essence, only the columns that are not hidden are used to create the view format as shown in the right-hand side of Figure 7. The RDA Generate DDL feature will return a complete column list when a future modification is required to the EMPLOYEE view.
Due to the security breach ACME has abolished the practice of doling out all object authority to developers on production systems. In addition, the public is excluded from using any view without permission. In essence, all database access must be fully authorized. The DBE is now responsible for granting the permissions to tables and views.
System i Navigator is the GUI interface for administering database maintenance such as object permissions; however, this can be quite cumbersome when making mass modifications. The ACME DBE has developed stored procedures for executing IBM i OS commands that grant and revoke object permissions during the object deployment process.
Figure 8 contains an example of an SQL Stored Procedure used to change the object ownership of all tables in a given schema.
The EMPLOYEE view works fine for procedures or queries that perform read-only functions, but what about those existing procedures performing insert, update or delete operations against the legacy EMPLOYEE table? No problem when you employ INSTEAD OF trigger support. An INSTEAD OF trigger enables the update, delete or insert operations to be performed against an SQL view that is considered to be non-updateable, non-deleteable or non-insertable. The new EMPLOYEE view contains a join between the Employee_Bio and Employee_Pay tables. Join views are normally not valid as the target table of an update, delete or insert operation unless an INSTEAD OF trigger is attached to the view.
Table 1 contains some basic INSTEAD triggers to handle simple inserts, updates and deletes against the EMPLOYEE view.
At this point in the project you’re ready to generate the test data. ACME utilizes a third-party product called DTM Test Data Generator. You can learn more about this and other test data generators by using any of the available Internet search engines to locate the software providers of these products (including IBM products). A characteristic of a good test data generator tool is the ability to generate data based on the physical data model. For the purposes of this article, assume the DB2 Sample database is a legacy database containing generated test data.
The following are some examples of inserts, deletes and updates against the EMPLOYEE view using the INSTEAD OF trigger support.
This example uses an INSERT with sub-select to populate the new test employee_bio and employee_pay tables from the test employee table.
--Set default schemaThe following query verifies the data is returned from the join view correctly.
SET SCHEMA DB2SANDBOX;
--Copy data from legacy test employee table to new test employee tables
INSERT INTO EMPLOYEE SELECT * FROM DB2_SAMPLE.EMPLOYEE;
COMMIT;
SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME, WORKDEPT,Table 2 contains the results of the above query.
SALARY, BONUS, COMM
FROM EMPLOYEE
FETCH FIRST 5 ROWS ONLY;
The next query validates that the EMP_ID and EMP_BIO_ROW_CHANGE_TS were automatically generated for the Employee_Bio table. Note the query is using the new 6.1 ROW CHANGE expression to list the rows that were added in the last 24 hours. The ROW CHANGE expression is only allowed for tables that have a row change timestamp.
SELECT EMPNO, EMP_ID, EMP_BIO_ROW_CHANGE_TSTable 3 contains the results of the above query.
FROM EMPLOYEE_BIO
WHERE ROW CHANGE TIMESTAMP FOR EMPLOYEE_BIO >
CURRENT TIMSTAMP - 24 HOURS
FETCH FIRST 5 ROWS ONLY;
The next query validates that the EMP_ID was assigned and the EMP_PAY_ROW_CHANGE_TS was automatically generated within the Employee_Pay table. Note the correlation name A for EMPLOYEE_PAY must be used for the ROW CHANGE expression.
SELECT EMPNO, A.EMP_ID, EMP_PAY_ROW_CHANGE_TSTable 4 contains the results of the above query.
FROM EMPLOYEE_PAY A
JOIN EMPLOYEE_BIO B ON A.EMP_ID = B.EMP_ID
WHERE ROW CHANGE TIMESTAMP FOR A >
CURRENT TIMSTAMP - 24 HOURS;
FETCH FIRST 5 ROWS ONLY;
SELECT EMPNO FROM EMPLOYEE WHERE EMPNO = '200330';The next step is to delete the row using the EMPNO unique key:
Result set contains 200330
DELETE FROM EMPLOYEE WHERE EMPNO = '200330';The final step is to verify the row no longer exists:
COMMIT;
SELECT EMPNO FROM EMPLOYEE WHERE EMPNO = '200330';
No rows returned
UPDATE EMPLOYEE SET BONUS = SALARY * .03;The following query verifies the results of the update. The derived BONUS_PCT column verifies that the bonus amount is 3 percent of the salary amount.
COMMIT;
SELECT EMPNO, SALARY, BONUS,Table 5 contains the results of the above query.
DECIMAL(BONUS/SALARY,3,2) AS BONUS_PCT
FROM EMPLOYEE
FETCH FIRST 5 ROWS ONLY;
INSTEAD OF triggers have many other uses that can further benefit ACME. For example, you can use an INSTEAD OF trigger as an alternative to updating status flags used for archive and purge purposes. In essence, you can transform a DELETE employee statement to an INSERT INTO EMPLOYEE_ARCHIVE and DELETE EMPLOYEE_PAY statement. The employee record will no longer appear in the EMPLOYEE view because the pay record no longer exists. Spreading updates and deletes over the course of the business day eliminates the need to perform long-running purge or archive batch jobs.
For the developers at ACME there will be changes. As more restrictions are imposed on production, backup and High Availability (HA) systems, more freedom is being given to the users of development systems. The ACME developers now have permission to use analysis tools previously available only to administrative staff, or that were locked down by operations staff to prevent possible system failures.
ACME management is discovering that it is less expensive to license new development tools to a smaller system. In fact, these smaller systems can be partitioned to support multiple OS releases, allowing developers to enhance their skills by trying new features and functions without having to wait for production systems to be upgraded. Over time, these systems can begin to support multiple databases using Independent ASP (IASP) technology. For example, you can use one database for unit testing, another for component testing and a third for regression testing rather than using different libraries for different levels of testing.
It’s true that all work and no play makes for a long day. As children we are taught to mind our P’s and Q’s in the classroom, but when the recess bell rings we are given our freedom. We learn the playground is where we can run, jump, scream and yell. It is there that we learn the values of teamwork and friendship. We learn that making room for others and sharing the sand in the sandbox (not hoarding it, throwing it or eating it-especially lumpy sand) is a win-win for everyone. Like the playground, an open development and testing environment is where the IT professionals learn the real skills needed to be successful.