Understanding XML databinding within Web Services

Archived Content
This article is provided for historical perspective only, and may not reflect current conditions. Please refer to relevant product page for more up-to-date product information and resources.
  • By Eran Chinthaka
  • 7 Nov, 2007

Applies To

Apache Axis2/Java v1.3

 

Introduction

WS DatabindingWeb services have become an important technology to interact with remote applications and infomation sources. With the spread of Web services, lots of tools have emerged to help both Web services users and authors. One type of tool that users always expect from Web services tools is databinding. It helps programmers to use and interact with XML in a more friendly and easy manner than a generic API (such as DOM). Especially if programmers are used to working with object-oriented code, the databinding tool will help them to manipulate the content of an XML document in an object-oriented manner.

Let's explore how databinding can help make Web services a part of your programming toolset.

What is Databinding and How it Works

When we interact with a Web service, we send and receive SOAP messages. But constructing SOAP messages can be a bit complicated for many programmers. RMI (Remote Method Invocation) introduces a simpler paradigm for interacting with remote services. Users call methods in local objects, and the underlying toolkit converts these calls to SOAP messages, hiding much of the complexity of the message exchange from the programmer. This simplicity makes RMI a popular way to access Web services.

If we think about service authors, most of the Web services deployed today are written using a particular programming language. For example, one can write a StockQuote in Java or C# and might want to expose that as a Web service, making it available to clients using other programming languages. Web services databinding tools facilitate deployment of services written in a particular language by performing the conversion from SOAP messages to the language specific constructs, and vice versa.

Let's take a concrete example. A Java object requires information about a person to carry out some operation, say user registration.

Consider the following Java class representing a person.

public class Person { 
private String name;
private int age;
private String sex;
// here you will write the getter and setter methods for each of the above attributes
}

Code Listing 1 : A Person class

Assume the Java object which registers users generates a Person object as shown in code listing 1. But when you expose this method as a Web service, the data contained in the Java object must be converted to a language-independent format - namely XML. In the other direction, when an object wishes to consume a Person object but is invoked using XML through a Web service framework, the XML must be converted into the expected Person object. Databinding is all about converting from XML to programming-language specific data structures.

Obviously your next question will be how the conversion actually takes place. How does a Web services framework create a Person class from the received XML and vice versa? What rules govern the databinding process?

When you expose a service as a Web service, you also need to publish a WSDL (Web Services Description Language) describing your service. This WSDL file defines the structure of the messages that the service can send and receive. Within this WSDL you can find descriptions of the XML structure (schema) of messages that are exchanged. XML Schema, as you might already know, defines rules for the content of an XML. A section of the WSDL is devoted to type definitions, containing an XML Schema language description of the messages. (Please see the tutorial from w3schools for an introduction on XML Schema). For example, the following schema describes an XML fragment which has a similar structure to the Person class in code listing 1.

<xs:element name="Person">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:int"/>
<xs:element name="sex" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>

Code Listing 2 : Schema of the Person class

Databinding tools examine the WSDL, extract and process the schema, and generate a set of language specific classes that represent the information from the schema. For example, the ADB (Axis DataBinding) framework builds a set of classes from a schema, in which each individual class includes methods defining how the data members can be serialized into XML and how they can be repopulated from XML. Converting from XML to objects is referred to as de-serialization or unmarshalling. Similarly, converting from an object to an XML is referred to as serialization or marshalling.

Some frameworks adopt a different approach to the serialization/de-serialization of information. Those frameworks generate two sets of classes for a given schema document. One set of classes, referred to as "Beans", will provide the object view of the XML. They will contain only the data within the XML. For example, code listing 1 contains the "Bean" class that will be generated for the schema shown in code listing 2.

The other set of classes, also referred to as helper classes, are responsible for the serialization and de-serialization of those bean classes. These helper classes enable those bean classes to contain only the getter and setter methods, without cluttering each class with complex serialization/deserialization code. This is useful in allowing programmers or those who use the XML databinding to always interact with bean classes which are simple and easy to understand and use.

The ADB framework that comes with Apache Axis2 can be configured to generate classes both ways.

You can see here the generated code for the above Person schema, using the ADB databinding framework. The Person class, as you can see, contains both the getter/setter methods and the serialization and de-serialization logic. Towards the beginning of this class you will be able to see the getter and setter methods. The rest is the complex logic to handle various schema constructs and to serialize and de-serialize the Person class.

How Databinding Works

In this article, I talked about two aspects of databinding. One is the runtime conversion of data structures between XML and (e.g.) Java structures. The other is the design-time activity of generating code which can perform the runtime conversions. One can define both the activities as databinding, but databinding really means what happens at runtime. Code generation, explained above, enables the runtime databinding.

It is also worth knowing how these databinding frameworks read and write XML documents. This will be helpful, especially if you want to feed your own XML structures or to serialize the beans to a specific format that you need. XML reading and writing generally is driven by an XML event-based approach. During the reading process, an XML parser is responsible for reading the XML document and generating events corresponding to the content of the XML document. A set of builders catch these events and build the object model and populate the data of the XML into these object models.

When we serialize the object model, the databinding framework can either provide an event based interface or write XML directly to the output media.

A common way to read an XML document is with SAX events. A SAX parser will throw SAX events corresponding to the XML structure it finds and the de-serialization logic is responsible for generating the object model. But the problem with SAX parsing is that the parser streams through the whole document, throwing each even just once. You cannot randomly access an XML document with such a fixed event stream. So when building an object model from SAX events, the whole XML document will be read and an object model will be created in the memory corresponding to the whole XML document. One might ask what the problem is. Think about a situation where you have a large XML document, perhaps couple of megabytes and you are interested only in the data in the first few kilobytes. Building the whole object model in memory is a waste of time, processor power and memory.

Databinding frameworks that use StAX, another event-generating parser model, can be better controlled by the databinding framework. ADB, which uses StAX as the interface to feed in XML, employs StAX and it will parse the XML only to the extent that is required to cater to the user request. This can be considered as an optimization over the conventional XML databinding frameworks. Nowadays there are more databinding frameworks coming up which can work with both SAX and StAX parsing.

Conclusion

Users and service authors like to work within the familiar constructs of a particular programming language when interacting with Web services, instead of raw SOAP messages. Databinding is the process of mapping XML into those familiar constructs. Web service tools help by generating code that performs this mapping. The code generation process takes a WSDL as input and generates client code in a particular programming language, referred to as a "stub", which can access the Web service described by that WSDL. Else, it can generate server side code, referred to as a "service skeleton".

Resources

XML Schema Tutorial from W3C

ADB XML Schema Support by Amila Chinthaka Suriarachchi

Introduction to StAX by Eran Chinthaka

Author(s)

Eran Chinthaka, Member Apache Software Foundation, PMC Member Apache WS Project. chinthaka(!) at apache(!) dot org(!)

About Author

  • Eran Chinthaka
  • Software Engineer
  • WSO2 Inc.