Extensible Markup Language (XML) is a text-based method for describing, storing, and delivering information ranging from straight text to complex binary data. XML has a number of advantages over proprietary storage formats such as DB2 Universal Database for AS/400 (UDB/400) and SQL Server. Because it’s text based, it’s easy to process and serve. It’s also easier for people and computers to read than proprietary formats, although few people will look at XML directly. Because it’s simple, extensible, and soon to be universal, it will make sharing information between applications, databases, and clients easier.
Many companies have been quick to recognize XML’s power. IBM is planning to support it throughout many of its product lines. Microsoft has incorporated some XML support into such mainstay revenue streams as Office 2000. Other industry-leading software vendors are following suit.
Why is XML so hot? Let’s find out.
Inside XML
XML is a markup language, meaning that it consists of tags such as <ROOT_ELEMENT> that are embedded in a file of text or data and that describe the content or presentation format of the text or data. You may already be familiar with XML’s sister language, Hypertext Markup Language (HTML), which consists of tags that describe the layout of a Web page. The parent of both of these markup languages, Standard Generalized Markup Language (SGML), is a complex, powerful language that’s widely used in the mainframe world but that hasn’t found its way into mainstream data processing due to its complexity.
HTML is an adequate language for describing a document’s organization and appearance, but it’s limited by design. It has a fixed set of markup tags, most of which control the appearance of text rather than its structure or meaning. For example, in the HTML snippet test, the tag indicates that the word "test" should be displayed in italicized type.
An example of HTML’s limitations that you’ve probably experienced firsthand is the language’s inadequacy in describing documents so that search engines can index them easily and correctly. Trying to locate specific information on the Internet can be like looking for a needle in a haystack, even with the help of search engines.
XML steps in where HTML leaves off by providing a way to describe the structure and semantics of the data contained in a document such as an HTML page. With XML, an industry group of online automobile merchants could decide to implement a <CARPRICE></CARPRICE> tag for car prices, so that a search for "price" would return dollar amounts.
XML doesn’t replace HTML because XML has no features of its own for describing the appearance of data. (Thus you always use XML in conjunction with another markup language, such as HTML, to visually format information for display.) Rather, XML lets you describe the structure of data items contained in a document so that you can consistently manipulate the document using those data items. XML accomplishes this by letting you create new tags that are specific to some type of data or application. Because of this extensibility, XML is sometimes referred to as a metalanguage.
XML’s extensibility may imply complexity, but for most applications, you work with a set of tags (defined by XML, of course) specific to the application or your industry. Thus, using XML in most cases is no more complex than using a more limited tag language, such as HTML. On the other hand, when the need arises to describe some new kind of data, XML lets you do this too.
XML is extremely useful for e-commerce because it lets industries define their own custom languages for exchanging data. For example, you could use XML to create a markup language (i.e., a set of tags) for defining purchase orders. Because XML tags and data are encoded in Unicode, nearly any computer in the world is technically capable of reading them, so you could exchange purchase orders with anyone using the same markup language. XML documents need not be displayed in a browser; XML’s data-organizing capabilities make it a great tool for machine-to-machine or application-to-application data exchange. Because XML is a self-describing language, any recipient of an XML document can make sense of the data in the document, even if the recipient has never seen that particular type of data before.
You might be thinking that if anyone can use XML to make up a markup language, isn’t XML just a recipe for chaos? No, because the creation of XML tags isn’t meant to be undertaken lightly. The expectation is that industry organizations and standards groups will define application-specific languages and promulgate these as individual standards. The advantage of XML is that everyone will be using the same metalanguage to describe application-specific tags. This greatly simplifies the development of platform support for a variety of tag languages. IBM, Microsoft, and other vendors can provide XML parsers and other tools as part of their operating systems, Web servers, and database management systems, and these tools will work with all the different XML-derived tag languages.
Putting XML to Work
Figure 1, below, shows a customer database record encoded as an XML
document (a complete piece of information represented in XML format). Document tags
such as
|
Figure 1 — XML Document |
|
<?xml version="1.0"?>
<ROOT_ELEMENT>
<CUSTOMERS>
<CUSTOMER ID="123">
<NAME>Acme</NAME>
<ADDRESS>123 Any Street</ADDRESS>
<CITY>Yourtown</CITY>
<STATE>FL</STATE>
<ZIP>98578</ZIP>
</CUSTOMER>
</CUSTOMERS>
</ROOT_ELEMENT> | |
XML documents that conform to XML rules and grammar are said to be well formed. A well-formed document that also follows a document type definition (DTD) is said to be valid. A DTD lays out the application-specific elements and the relationships of the elements in an XML document for a particular industry or group. Groups interested in sharing information create DTDs that define how the information they want to share is structured, and those DTDs then become de facto standards for the group. For example, there are DTDs for marking up EDI, financial information, health care, data modeling, mathematics, chemistry, and many other applications.
The XML Bandwagon
The World Wide Web Consortium (W3C) — the organization that controls the standards for HTML — recommended XML as a standard in February 1998. W3C’s recommendations are often used as guidelines for implementing new Web-based technologies and can be the first step toward widespread acceptance by vendors and developers and support and use on many platforms.
A quick visit to IBM’s XML Web site (http://www.ibm.com/xml) indicates that IBM is paying attention to XML as well. IBM’s site offers several XML applications for download, including an application for converting a DTD to a JavaBean. There’s also a free XML parser written in Java that you can incorporate into your own apps. IBM is also investing in XML for the AS/400 (more about this in a minute).
Microsoft is adding XML support to many of its products and encouraging developers to use it as well (http://msdn.microsoft.com/xml/default. asp). Internet Explorer 4.0 includes rudimentary XML support; 5.0 contains additional XML capabilities and can display XML documents directly in the browser.
Other vendors are also supporting XML. Lotus is adding XML capabilities to Domino. Netscape announced XML support in the new Gecko layout engine, which will be part of Communicator 5.0. And Corel is implementing XML in the next release of WordPerfect Suite.
The AS/400 and XML
"XML support" can mean different things. Because XML tags and data are Unicode text, the AS/400 can support XML at the most basic level by storing XML documents in the IFS and serving them via an HTTP server. However, IBM wants XML to play a larger role in the AS/400’s future. WebSphere 1.0 has been able to run the Java-based XML parsers and serve Java servlets that encode/decode and display XML since V4R3. For embedding XML in applications, the AS/400 Toolbox for Java includes support for the XML markup languages Panel Definition Markup Language (PDML) and Program Call Markup Language (PCML). Java developers can use PDML tags to describe user interfaces and PCML tags to describe AS/400 program calls from Java applications.
In the future, IBM plans to provide enablers that will encode 5250 data streams in XML and convert UDB/400 data into XML. IBM’s "pervasive computing" initiative relies heavily on XML to provide data to various computing devices, from traditional PCs to handheld personal digital assistants (PDAs) to mobile phones. IBM Rochester wants the AS/400 to be able to serve data to all the devices you may be using in the future, and the division sees XML as a means to accomplish that goal.
XML is an enabling technology, designed to describe data to machines and facilitate communications between them. For more information about XML and whether it’s being used in your industry, visit the Web sites mentioned in XML Resources (below). Although you may not use it directly (you may just use the functionality it enables), XML promises to ease many of the sometimes overly complex processes involved in exchanging data.
XML Resources
World Wide Web Consortium XML site (includes the XML source specification)
http://www.w3c.org/xml
IBM’s main XML site
http://www.ibm.com/xml
Microsoft’s XML site
http://msdn.microsoft.com/xml/default.
asp
Industry-specific usage of XML and other information
http://www.oasis-
open.org/cover/xml.html#applications
Brian Singleton is a senior technical editor for NEWS/400 and the custodian of the Tips and Tech Community on the NEWS/400 Web site. You can reach him at bsingleton@news400.com.