Published on System iNetwork (http://systeminetwork.com)
Use SQL to Manage Complex and Non-Relational Data in DB2
By bradforde
Created Jan 12 2009 - 23:18

By:
Kent Milligan [1]

To keep up with new data requirements from users and customers, you should regularly update your database model. But as more users capture data with workstation tools and exchange data over the internet, many data formats don't always fit into traditional columns or fields.

Many DB2 for i features are available to help you manage data in complex and non-relational formats. Hopefully it's no surprise to you that all of the DB2 functions I'll discuss here require SQL at some level—SQL is the strategic interface for DB2 for i, so jump on board!

LOB Approach

There are three DB2 large object (LOB) data types: binary large object (BLOB), character large object (CLOB), and double-byte character large object (DBCLOB). All let you define a database column capable of storing complex and non-relational data formats (e.g., product brochures in PDF format). Here's an example of a simple table definition that does this:

CREATE TABLE products(
  prodID INTEGER,
  prodName CHAR(30),
  prodDescription VARCHAR(128),
  prodBrochure BLOB(10 M) )

As you can see from the last line of the definition, the prodBrochure column in the products table can store PDF documents up to 10 MB. Note that large object columns have a maximum size limit of two GB, so keep this in mind when adding a LOB column to your database—these columns can get quite large.

By default, LOB columns are stored with variable length columns as shown in Figure 1a [2] in the variable length auxiliary storage container. Note that whenever DB2 for i accesses one column stored in the auxiliary storage container, all of the columns in the auxiliary storage container are paged into memory. So, even though the following query makes no reference to the prodBrochure column, this LOB value will be paged into memory by DB2 due to the prodDescription variable-length column reference.

SELECT prodID, prodDescription FROM products 
   WHERE prodName='FASTENERS'

Obviously, paging an extra 10 MB of storage will affect performance, but there are two methods for addressing this issue. First, using the ALLOCATE keyword isolates any LOB columns from variable length columns by allocating space for the variable length column in the fixed length storage container. Note that the fixed length storage container is limited to a record width of 32KB. Here's how I defined the products table using this technique:

CREATE TABLE products_alloc(
  prodID INTEGER,
  prodName CHAR(30),
  prodDescription VARCHAR(128) ALLOCATE(128),
  prodBrochure BLOB(10 MB) )

By updating the definition, I have ensured that the prodBrochure BLOB column will be the only column stored in the auxiliary storage container, as shown in Figure 1b [3]. Thus, the BLOB column will now only be paged into memory when it's explicitly referenced and will no longer be paged into memory when the prodDescription column is referenced, which gives us increased storage. Now, 128 bytes of storage will be allocated for every value in the prodDescription column, effectively turning this column into a fixed length character column.

Another technique is to place the LOB column in a new table instead of adding it to an existing table. As in the previous approach, this method ensures that the LOB column will only be accessed when the application explicitly references the new table. A major benefit of this approach, however, is that your existing table or file definition can remain as-is. As a result, you will only have to change those programs that need to reference or access the LOB column. In the example below, you can see that the LOB column has been moved to a second table. I have also replicated the product key in the new table so the brochures can link to the product data in the main products table.

CREATE TABLE products2(
  prodID INTEGER,
  prodName CHAR(30),
  prodDescription VARCHAR(128)   )

CREATE TABLE brochures2(
  prodID INTEGER,
  prodBrochure BLOB(10 M) )

An additional benefit of the LOB approach is that you do not have to alter your database backup process. Because the LOB values reside within the table object itself, backing up non-relational and complex data is as simple as saving a DB2 table.

So how do you get non-relational objects, such as spreadsheets and PDF documents, into a LOB column? Instead of using low-level routines to write the complex objects byte by byte, you can utilize SQL file reference variables to simplify the task of populating a LOB column. These variables contain a file's complete path name, path length, and specified action. (The file action would be set to "Read" when populating a LOB column value with the contents of an IFS file.)

SQL file reference variables also make it easy for applications to use LOB column values. When an application or user needs to access complex data stored in a LOB column, that access cannot be gained directly from the application. Instead, an application must first extract the information from the LOB column and place it in a file object that is accessible by the application. For example, if a LOB column contains a JPEG image of an item, that image would have to be copied from the LOB column into an IFS file before the JPEG image could be displayed by the application.

Figure 2 [4] shows an SQL file reference variable in action. Here, the file reference variable (named MYFILE) is being used to take a PDF file object stored in a BLOB column and copy the contents to an IFS file. First, the user must assign the path name of the target file and the length of that target file path name. With the target file location identified, the next step is to assign the file action. I have assigned the Overwrite action in this example, which either creates a new file with the specified name or overwrites any existing file with the same name.

With the MYFILE file reference variable ready to go, a user needs only to reference it on a Select statement. In our example, the Select statement retrieves the prodBrochure BLOB column and specifies that the contents of the column be assigned to the MYFILE file reference variable. During this assignment, DB2 will perform the specified file action and write the contents of the BLOB column into the target file. In this instance, using SQL file reference variables has made the coding very straightforward.

Linked Approach

Some applications may not want to wait to copy non-relational data from a DB2 LOB column each time that data is needed. In this case, the non-relational data may need to be stored outside of the database, which you can do with the Datalink data type. The Datalink data type supports the storage of a Universal Resource Locator (URL) value to store information about an external file's location and link it to any related data in the DB2 table.

Let's revisit the product brochure example from a Datalink perspective. In this example, the prodBrochure column has been changed to use the Datalink data type instead of the BLOB type:

CREATE TABLE products_dl(
  prodID INTEGER,
  prodName CHAR(30),
  prodDescription VARCHAR(128),
  prodBrochure DATALINK(250) )

A Datalink column value is actually an encapsulated value consisting of three parts:

  • the URL value
  • the optional comment
  • link-type (URL is the only type supported)

The comment string can contain a maximum of 254 bytes. Some applications may use the comment field to store information about the linked file (e.g., version of the external file, owner of external file), and other applications may not use it all.

Just like the BLOB column, you must specify a length for the Datalink type, but this length value has nothing to do with the size of the product brochure PDF document. Instead, the length should be big enough to store the longest expected URL string and the optional comment. So before you specify the length, decide if the comment value is needed, and then review the external file system housing the objects you intend on linking in order to determine the directory structure and length of the file names. If the initial Datalink column length is too small, you can use the Alter Table statement to increase its size.

Now that you've created a Datalink column, the next step is to populate the column, which I've demonstrated with the sample Insert statements found below. The first Insert statement only supplies the URL value for the referenced PDF file. In this case, the link type defaults to 'URL,' and the comment value defaults to an empty string (''). The second INSERT statement provides a comment string along with the URL for the referenced file object.

    INSERT INTO products_dl 
          VALUES(33, 'BOLTS', 'Really Good Bolts',
            DLVALUE('http://myserver1/subdir2/bolt_brochure.pdf'));

    INSERT INTO products_dl 
          VALUES(34,'SCREWS', 'Stainless steel screws',
DLVALUE('file://myrootdir/subdir/stainless_screws.pdf',
                    'URL','Created with Adobe 6.0'));

Notice that in both Insert statement examples, the Datalink value is supplied using the DLVALUE built-in function instead of a simple character string. One advantage of storing a file path name in a Datalink column as opposed to a character column is the availability of strong typing. With strong typing, a Datalink column forces all of the input file paths to come through the DLVALUE function, which validates the specified string to ensure it is a legitimate URL value. If an application attempts to assign a character string such as 'Random String' either directly to a Datalink column or as the location parameter on the DLVALUE function, the request will fail because it is lacking a valid URL. In contrast, a table using a simple character column to store file path values has no easy way to prevent an application from storing the string 'Random String' as a file path value.

DB2 also provides SQL functions to retrieve the encapsulated values from a Datalink column. Available functions include:

  • DLURLCOMPLETE
  • DLURLPATH
  • DLURLPATHONLY
  • DLURLSCHEME
  • DLURLSERVER
  • DLCOMMENT
  • DLLINKTYPE

If a Datalink column is accessed without one of these functions, DB2 by default returns the full URL value for the Datalink column. The table in Figure 3 [5] contains output values that would be returned by the following Select statement against the products_dl table:

SELECT prodBrochure, DLURLCOMPLETE(prodBrochure),
               DLURLPATH(prodBrochure), DLURLSCHEME(prodBrochure),
               DLURLSERVER(prodBrochure), DLCOMMENT(prodBrochure)
               DLLINKTYPE(prodBrochure)
  FROM products_dl  
  WHERE prodName='SCREWS'

These functions return the location of the external object so the application can make the data in the linked object available to the end user.

Some of the Datalink retrieval functions (e.g., DLURLPATH) will also return a file access token if the Datalink file control support has been activated by the FILE LINK CONTROL clause, which gives a Datalink column with DB2 control security and integrity constraints for a linked external file. With these added controls, you can manage external dependencies to ensure that a linked file in the IFS isn't deleted or renamed while it's linked to a Datalink column value in your database. (Note that the file link control support hasn't been widely adopted. For more details on this support, visit "Using Datalinks in DB2 UDB for AS/400," [6] August 2000, article ID 7767 at SystemiNetwork.com or the IBM Systems Infocenter.)

In addition to backing up DB2 tables, you will also need to ensure that all of the linked external file objects are saved to backup media. Because the Datalink feature doesn't include support for this, query the Datalink column to generate a list of the external objects you need to back up.

Logic Approach

When you need to access non-relational or legacy business data but don't need to store it within a DB2 table, try a user-defined table function (UDTF). A UDTF lets you call a program in any language supported by IBM i and have DB2 convert the program output into a relational format. So, if you have an existing program that extracts data from a stream file in the IFS, you can register the program as an external UDTF, which lets SQL developers write queries that process data stored in IFS files. That's quite a feat!

Using a UDTF is simple—the SQL developer only needs to know the name and attributes of the output fields returned by the UDTF. Here's a sample statement invoking the EmployeesDept UDTF to return information about employees in the specified department:

SELECT empno, lastname, location 
    FROM TABLE(employeesdept('503')) myudtf 

SQL developers don't need to care where the employee data is coming from. The data source could be a table, IFS file, data queue, etc. (In this case, the employee data is actually coming from a legacy S/36 file.) Figure 4 [7] shows the definition for the EmployeesDept UDTF. This function definition contains a single input parameter (department number) and returns seven different employee attributes. The UDTF calls the EMPDEPUDTF RPG program behind the scenes to extract the data from the S/36 file for the specified department number. For more information on using UDTFs with S/36 files, see "UDTFs: The Unsung DB2 Function," [8] October 2007, article ID 20788 at SystemiNetwork.com.

You can code an external UDTF to access just about any type of non-relational or complex data object, including data stored in IBM i proprietary data objects such as data areas or data queues. UDTFs are not limited to proprietary IBM i objects. (For information about using UDTFs with XML files, check out "Use a UDTF and SQL to Query XML Files," [9] January 2009, article ID 62656 at SystemiNetwork.com.)

What about XML?

While UDTFs provide one solution for processing data in an XML document, you can also use DB2 XML Extenders for this task. These two products (one for text, one for XML) provide a set of stored procedures and functions that give developers basic tools for storing, searching, and retrieving XML documents. The XML Extenders require the purchase of the DB2 Extenders licensed program product (5761-DE1), first available in 5.1.

When it comes to storing XML documents, the XML Extender also uses the LOB approach by providing a set of user-defined types based on the DB2 LOB data types. In addition to storing and retrieving XML documents, the XML Extender also includes procedures that can generate an XML document from existing columns in your database. Conversely, you can also extract or insert data into your database from an existing XML document. (For a more thorough understanding of the DB2 XML Extender capabilities, visit the IBM Redbook The Ins and Outs of XML and DB2 for i5/OS [10].)

Hopefully this article has opened your eyes to the number of DB2 for i features that are available to help you store and process complex and non-relational data as you modernize your database.

Kent Milligan (kmill@us.ibm.com [11]) is a senior DB2 for i specialist on IBM's ISV Enablement team. He spent the first eight years of his IBM career as a member of the DB2 development group in Rochester, Minnesota. He speaks and writes regularly about relational database topics.

© 2010 Penton Media, Inc.

Source URL: http://systeminetwork.com/article/use-sql-manage-complex-and-non-relational-data-db2

Links:
[1] http://systeminetwork.com/author/kent-milligan
[2] http://systeminetwork.com/files/63061_Fig1a_0.jpg
[3] http://systeminetwork.com/files/63061_Fig1b_0.jpg
[4] http://systeminetwork.com/files/63061.Fig2_.txt
[5] http://systeminetwork.com/files/63061.Fig3_.pdf
[6] http://systeminetwork.com/article/using-datalinks-db2-udb-as400
[7] http://systeminetwork.com/files/63061.Fig4_.txt
[8] http://systeminetwork.com/article/udtfs-unsung-db2-function
[9] http://systeminetwork.com/article/use-udtf-and-sql-query-xml-files
[10] http://www.redbooks.ibm.com/abstracts/sg247258.html
[11] mailto:kmill@us.ibm.com