2006/11/14
14 Nov, 2006

AXIOM - Fast and Lightweight Object Model for XML - Part 1

  • Eran Chinthaka
  • Software Engineer - WSO2

Introduction

Any application that aims to cater to high demand XML processing encounters memory and performance barriers, the main culprit being the memory-intensive object model used inside those applications. AXis Object Model, or better known as AXIOM originated from the Apache Axis2 effort, as a new lightweight and efficient object model for representing XML. It has been specifically engineered to be less memory-intensive, using deferred building.

Part 1 of this article looks in to the architecture of AXIOM and will explain the way to navigate through the object model.

Why and What is AXIOM?

The current approaches for XML processing can be categorized into two broad approaches:

  • Tree-based approach - In this approach, which is followed by DOM (Document Object Model) -like APIs, the whole XML file is loaded into the memory, which causes the size of the object model to be larger than the source XML. Therefore, this method is not appropriate for memory constrained environments (J2ME, for example) or for systems handling large documents (a Web services engine, for instance).
  • Event-based approach - This approach processes the source XML in chunks and do not need to build complex memory structures. It can start working with the receipt of the first byte of the source. But developer control over this approach is minimal as event-based APIs feed the content as and when they see the document, regardless of whether the application is ready to receive data or not.

One of the aims of latest Streaming API for XML (StAX API - JSR 173) was to overcome this developer control problem. (Click here for more information about StAX). But you cannot easily navigate forward and backward using an event-based approach. That is the reason why the tree-based approach is preferred over event-based approach amongst developers, although it is memory inefficient.

AXIS Object Model, a.k.a. OM (AXIOM) is introduced to get the best of both worlds. AXIOM depends on the StAX API in order to input and output data. The important point here is that this has deferred building support, which none of the existing object models have. That is, this model will not build the document until it is absolutely required by the application. The object model only contain features that are already built and the rest is still kept in the stream. Moreover, AXIOM can provide StAX events from any given point of the received document, whether it has already been built or not. Further more, AXIOM has the option to give these StAX events with or without building the memory model for later access. This is called caching in AXIOM. Even though AXIOM provides "flavors" of both ends, care has been taken not to compromise performance due to this enhancement.

AXIOM API has been developed in a manner that it is straight forward to a Java programmer. One of the major requirements of Apache Axis2, the next generation of the renowned Web service engine Apache Axis, was to give it the ability to have a low memory foot print, yet with a very fast object model. AXIOM caters to that very requirement.

AXIOM Architecture

Axiom Accessing the XML Stream image

Figure. 1: AXIOM Accessing the XML Stream

As can be seen in Figure 1, AXIOM accesses the XML stream through the StAX interface. AXIOM binary release contains StAX parser implementation from Woodstox. But you are free to use any of the implementation which implements StAX API.

Object Model Architecture Image

Figure. 2: Object Model Architecture

Figure 2 provides a deeper look into AXIOM. AXIOM "sees" the XML input stream through the StAX stream reader, which is being wrapped by a builder interface provided. The current implementation has three builders, namely:

  • StAXOMBuilder - this will build full XML info-set supported general XML model.
  • StAXSOAPModelBuilder - this will build SOAP specific object model. Object model will contain SOAP-Specific objects like the SOAPEnvelope, SOAPHeader etc.
  • XOPAwareStAXOMBuilder - This is used to handle SOAP messages with MTOM (Message Transmission Optimization Mechanism) nodes.

Each of the builders provide support for deferred building and caching. The user has the option of building the memory model (or not) and can control this via setting the cache to ON or OFF.

The AXIOM API works on top of the builder interface and provides the user with a convenient, yet powerful API. It will provide the highest flexibility, as one can change builders and object model implementations completely independent of one another. AXIOM has a defined set of APIs and you can implement your own memory model based on that.

Currently, Axis2 comes with two implementations of those sets of APIs. (There was an effort to build another AXIOM API implementation on a table-based model, but was discarded later). The first implementation is based on a linked list based model and it is considered as the default implementation of AXIOM. The other model, also referred to as DOOM (DOM Over OM, rather Document Object Model over Object Model) attempts to layer DOM interfaces over AXIOM objects and had proven to be capable of providing DOM capability on AXIOM. DOOM is used with Axis2 in providing WS-Security support as most of the security standards like XML Security and Canonicalization are already implemented on DOM interfaces.

To work with different implementations, AXIOM has the concept of a factory to create AXIOM objects, which will help to switch between different implementations of object model. The factory is designed such that if no specific AXIOM implementation is given it will automatically pick the default one from the class path.

Figure 3: OM API and OM Factory

Figure 3: OM API and OM Factory

Using AXIOM

AXIOM implementation is now in version 1.2. Current implementation has full XML infoset support.
To make things more convenient, AXIOM can be customized to different XML object models. For example, the StAXSOAPModelBuilder will make AXIOM, a SOAP object model builder.

Getting AXIOM Binaries

The easiest way to obtain the AXIOM binaries is to download the binary distribution. The latest release available, as of writing this article is AXIOM 1.2. The build directory contains the axiom-api-1.2.jar and axiom-impl-1.2.jar. In addition to those dependencies, AXIOM requires a parser implementing StAX API, logging capabilities, XPath capabilities and MTOM capabilities. All the dependencies of AXIOM is distributed inside AXIOM binary release and available inside lib folder of the release.

Adventurous users can build the AXIOM from the source release.

Let's take the following XML as our example. This xml can be found at axiom-part1/resources/sample.xml inside the sample code.

    
<ns1:EmployeeInformation xmlns:ns1="https://www.axiom.org/article/oxygentank">
<ns1:Employee>
<Name>Dihini Himahansi</Name>
<Division>Engineering</Division>
<Address type="Home">
<City>Ambalangoda</City>
<Country>Sri Lanka</Country>
</Address>
</ns1:Employee>
<ns1:Employee>
<Name>Thushari Damayanthi</Name>
<Division>Business Development</Division>
<Address type="Temporary">
<City>Rajagiriya</City>
<Country>Sri Lanka</Country>
</Address>
</ns1:Employee>
</ns1:EmployeeInformation>

Getting a Builder

First we need to create an instance of builder to parse the above xml file. Code snippet in Code listing 1 demonstrates how to create a builder:

Code Listing 1 - Creating a builder from a given input stream

    
XMLStreamReader xmlStreamReader =
XMLInputFactory.newInstance().createXMLStreamReader(inputStream);
StAXOMBuilder stAXOMBuilder = new StAXOMBuilder(xmlStreamReader);

or simply
StAXOMBuilder stAXOMBuilder = new StAXOMBuilder(inputStream);

First, create an instance of the XMLStreamReader class from the input XML file or stream.

Next, create an instance of StAXOMBuilder. Note that you have the option of passing an instance of OMFactory to the builder. This OMFactory will enable to switch between different AXIOM API implementations, without changing a single line of code. If no OMFactory is passed, the default linked list implementation is assumed.

Even though you have created a builder and a reader so far, not a single model of the received XML is created in the memory.

Accessing Element Information in the XML

Now let's try to read the contents of the XML using AXIOM APIs. First let's retrieve all the <Person> element information items from the xml.

First the document element has to be taken from the builder. This will be an instance of OMElement. Then you can call either it's getChildren() method as this particular XML only has Person element, or retrieve children which has a given QName. Code listing 2 demonstrates the second method of accessing children by searching for a QName.

Code Listing 2 - Retrieving children of a given QName

OMElement documentElement = stAXOMBuilder.getDocumentElement();

QName employeeQName = new QName("https://www.axiom.org/article/oxygentank", "Employee");
Iterator employeeElementsIter = documentElement.getChildrenWithName(employeeQName);

You will notice that most of the methods will have QName or OMNamespace (AXIOM's representation of a namespace) as a parameter as we are in the practise of promoting the use of namespaces everywhere.

documentElement.getChildrenWithName(employeeQName) will return an iterator of all the children elements of the document element which has the employeeQName. The beauty of the parser here is that the iterator returned does not have information, until it is being asked for. The iterator asks the builder to build if iterator needs information. There are lots of enhancements like this within AXIOM, to make it as light weight as possible, yet not compromising performance.

Having got hold of the iterator let's iterate through them and print out the names of the employees.

Code Listing 3 - Iterating over children and retrieving children

    while (employeeElementsIter.hasNext()) {
OMElement employee = (OMElement) employeeElementsIter.next();
OMElement name = employee.getFirstChildWithName(new QName("Name"));
System.out.println("EmployeeName = " + name.getText());
}

<Name> element, which is a child of the <Employee> contains the name of the employee. Retrieve the <Name> element by calling employee.getFirstChildWithName(new QName("Name")) . The advantage here is that, if you know you want to get the first occurrence of a particular element, then calling getFirstChildWithName() becomes handy.

When you have the <Name>, simply calling the getText() method gives you the text content of that element, which is the name of the employee.

Code listing 4 demonstrates the way to access the attributes of an OMElement.

Code Listing 4 - Accessing Attributes

    while (employeeElementsIter.hasNext()) {

// get the employee element
OMElement employee = (OMElement) employeeElementsIter.next();

// get the address of the employee
OMElement address = employee.getFirstChildWithName(new QName("Address"));

// get the name of the employee
String employeeName = employee.getFirstChildWithName(new QName("Name")).getText();

// get the address type by looking at the type attribute in Address element
String addressType = address.getAttributeValue(new QName("type"));

System.out.println("Employee " + employeeName + "'s Address a " + addressType + " address");
}

Accesssing attributes inside an element is similar to accessing children of the element. If you want to retrieve the value of an attribute of a known name, then calling getAttributeValue(attrQName) will do it. AXIOM has the notion of OMAttribute to represent the QName and the value of the attribute. Let's explore more about OMAttributes when we try to create an AXIOM tree, programmatically.

Serializing

AXIOM gets input using the XMLStreamReader of StAX API, and outputs the result using the XMLStreamWriter interface of StAX API. But we have overridden the toString() method of the OMElement so that you don't need to worry about XMLStreamWriter (See code listing 5). But if you want to output your OM tree to an output stream directly, then you might want to consider using XMLStreamWriter (See code listing 6).

Code Listing 5 - Printing the XML to a string

    System.out.println("documentElement = " + documentElement);

Code Listing 6 - Printing the XML using XMLStreamWriter

    XMLStreamWriter xmlStreamWriter = 
XMLOutputFactory.newInstance().createXMLStreamWriter(System.out);
documentElement.serialize(xmlStreamWriter);

AXIOM has another way of "spitting" out the data inside it. It can enable all of its elements to throw StAX events directly. If someone is willing to work on StAX events, especially to improve memory, then AXIOM can handle that too. You can call documentElement.getXMLStreamReader() and get StAX events of the document element. If you are interested in getting events from only a child element, then find it, and call the getXMLStreamReader() method.

Even though this seems easy, AXIOM handles lot of complexities underneath, transparent to the user. Let's assume the object model is partially built and you have only accessed the first Employee element. When you ask to get StAX events from the document element, AXIOM first throws events from the object model that was built in the memory. Once it is finished, it switches to the underlying StAX parser and starts throwing events directly from it.

Namespace Handling

AXIOM has its own OMNamespace class to handle namespaces. But this does not prevent you from adhering to the conventional method of using QName. You have the option of using the OMNamespace to declare a namespace or to use either createOMElement(localName,namespaceURI,namespacePrefix) or createOMElement(qname, OMElement parent) methods.

Namespaces can be declared in an OMElement using declareNamespace(uri,prefix) or declareNamespace(OMNamespace) method.

Conclusion

Basic concepts in AXIOM were introduced in this article. I strongly recommend that curious users to have a peek at the current sources or to download the binary distribution and play with it. The next article on this series will concentrate on caching of data and the optimizations done in order to make AXIOM work much better in handling SOAP messages.


Resources

Author

Eran Chinthaka, Senior Software Engineer, WSO2 Inc. chinthaka(at!)wso2(dot!)com

 

About Author

  • Eran Chinthaka
  • Software Engineer
  • WSO2 Inc.