AIX Tip of the Week

XML Databases

By Danny Kalev

Date: September 27, 2002

A relational database stores information in tables, with the ability to relate information from one table to information in another table. The primary data unit in relational databases is a column, which may contain one or more fields. While this model operates well under traditional data oriented systems, it isn't ideal for manipulating XML data. Database vendors have come up with several solutions to this problem. I will discuss two representative products that are available on Linux.

Data-centric and Document Centric XML

Document-centric XML documents typically contain essays, poems, letters etc. In these documents, the primary data unit is the entire document, rather than individual fields. By contrast, data-centric documents can be an employee's information or a catalog item. Human beings are not intended to read such documents in their raw form, but rather they are created and used by software application. In such documents, individual nodes contain meaningful information even when used outside the context of their document, an employee's name for example.

XML-enabled Databases

Relational databases that offer special capabilities for dealing with XML data are known as "XML-enabled databases". They map XML data to traditional rows and columns internally. The simplest mapping model simply stores the original XML document in a table column as a Large OBject, or LOB. This mapping model isn't ideal for performing sophisticated queries based on specific elements of an XML document since the database engine isn't aware of its structure. IBM's DB2 uses "side tables" to solve this problem. A side table contains additional indexes that point to specific elements in the original XML document.

Alternatively, a side table contains the navigation scheme of the original document (known as "XPath" in XML-parlance). However, neither of these methods is truly relational. If we wish to change a single node, we need to retrieve the entire document value, change it, and write it back. This approach is therefore more suitable for document-centric applications.

Object-relational Storage

A second mapping model breaks the elements of an XML document into individual table rows, or fields. This way, the database is aware of the original document's structure. Oracle's 9i database implements this approach. The DBA can decide which documents should be mapped to a table schema and which documents should be stored as LOBs. Equipped with an array of XML-related software tools such as XML Class Generator, XML SQL Utility and XML Parser, Oracle's 9i handles document-centric and data-centric XML equally well.

Bruce Spencer,
baspence@us.ibm.com