2007/04/04
4 Apr, 2007

Implementing E4X in Rhino Using Apache AXIOM

  • Sameera Jayasoma
  • Senior Director, Platform Architecture - WSO2

Background

Some of you may not have used E4X, Rhino, or even Apache AXIOM. Therefore, I have introduced these terms in detail in the next section, as they have been used throughout the article. The article contains several sections as follows.

  1. Introduction
  2. Integration of Apache AXIOM to Rhino
  3. Using E4X in XML processing
  4. Motivation behind implementing E4X in Rhino using Apache AXIOM
  5. Performance

First let us look at the implementation of E4X using AXIOM and how you can use E4X in XML processing. Then we can focus on why we need this implementation. The motive behind this implementation is to make JavaScript Web service in Apache Axis2 faster. Furthermore, it describes how this implementation helps to improve the performance of JavaScript Web services.

Introduction

E4X in Rhino

XML processing is a significant part in desktop and Web applications developed today. Most of these Web applications use XML as a data exchangeable format. For example, let's consider the new concept Mashups. Mashups, as the name implies, is the mixing together of various things. Mashups bring together different data, different information, and different applications into a single application. Mashups consume several Web services, extract information through their API (Application Programming Interface), and use them to build new applications that are far more useful than the individual parts.

Web services interact using a SOAP, which is a protocol based on XML. Web services are described using the Web Service Description Language (WSDL) which is also encoded in XML. Consuming Web services involve a lot of XML processing in both the sender/client side and the receiver/server side. In some instances, your Web browser is the client that invokes the Web services; most of the time scripting languages such as JavaScript is used to accomplish this task. Invoking a Web service involves making a request and then processing the response; these requests and responses are encoded in XML.

The JavaScript language is standardized by ECMA (The European Computer Manufacturers Association) and the official name is ECMAScript. I will be using the words JavaScript and ECMAScript interchangeably throughout the article. Earlier, XML processing in JavaScript seemed to be heavyweight, complex, and unfamiliar to JavaScript programmers. ECMAScript for XML (E4X) has been introduced to address these problems. E4X extends the ECMAScript language with native support for XML processing. This standard provides a simpler, familiar, general purpose XML programming model. It adds native XML data types to the ECMAScript language, extends the semantics of familiar ECMAScript operators for manipulating XML data and adds a small set of new operators for common XML operations, such as searching and filtering. It also adds support for XML literals, namespaces, qualified names, and other mechanisms to facilitate XML processing.

Rhino is an open source implementation of the ECMAScript standard written entirely in Java. It is a JavaScript engine that supports only the core language features. (It does not support the manipulating of HTML documents.) It is typically embedded into Java applications to enable the scripting capability to end users. Apache Axis2/Java (Web service engine) uses Rhino in JavaScript Web services.

Apache AXIOM, which stands for AXis Object Model, is specifically developed for Apache Axis2 to improve its performance. This is because XML processing (handling for SOAP messages) is the most important and complex task in Web services. AXIOM differs from other object models because it is lightweight and builds the object model on demand (differed building). AXIOM achieves the differed building capability by getting support from the underlying Streaming API for XML (StAX) parser (pull parser), to input and output data.

Integration of Apache AXIOM to Rhino

Rhino supports E4X since the release of 1.6R1, and the Apache XMLBeans library is used to implement E4X runtime. Although the default E4X implementation depends on XMLBeans, Rhino provides a pluggable interface for different E4X implementations. This enables users who embed Rhino in their applications to have their own E4X implementation based on a preferred XML object model. The provided interface consists of two abstract classes called XMLObject and XMLLib. The XMLObject abstract class describes what all XML, XMLList native objects should have in common.

E4X introduces four native objects to the ECMAScript language. They are XML object, XMLList object, Namespace object, and Qname object. In addition to these native data types, E4X extends the semantics of familiar JavaScript operators and adds a small set of new operators for common operations, such as searching and filtering XML data.

  • Namespace objects represent XML Namespaces and provide an association between namespace prefix and Unified Resource Identifiers(URI).

  • QName objects represent qualified names of XML elements and attributes.

  • XML objects represent an XML element, attribute, comment, processing-instructions or text node.

  • XMLList objects represents an XML document, XML fragment or arbitrary collection of XML objects.(They can be a result of a query.)

The following diagram shows the class structure, which is followed by this AXIOM implementations of E4X.

Using E4X in XML Processing

In this section, the following examples will guide you to understand the basics of XML processing in E4X. I use a command line tool in Rhino called a JavaScript shell, which provides a simple way to run scripts in batch mode or an interactive environment. Before you try out the examples, make sure you have the required packages, as given in the steps below.

  1. Download Rhino. Since E4X is not available in earlier releases, make sure to download the latest release. The default E4X implementation in Rhino is not an AXIOM implementation. Therefore, you need to build Rhino without the default E4X implementation and then add the AXIOM implementation of E4X to the classpath. Use the following command to build Rhino without E4X.

    ant jar -Dwithout-xmlimpl=true

    Once you build Rhino, the js.jar file is created in the {RhinoHome}/build/rhino{version}directory.

  2. Check out the source code of the AXIOM implementation. Use the following command.

    svn checkout https://wso2.org/repos/wso2/trunk/wsf/javascript/rhino/

    Follow the instructions given in the Readme file. Once you build the source using Apache Maven, you will get a .jar file called

    js-axiom-SNAPSHOT.jar
  3. In order to run the Javascript shell in Rhino with an AXIOM implementation of E4X, there are several other dependencies. You can download them into you local Maven repository in the building process of step 2. The dependencies are.

    axiom-api-SNAPSHOT.jar,  axiom-impl-SNAPSHOT.jar, commons-logging-1.0.4.jar,             
    stax-api-1.0.1.jar, wstx-asl-3.0.0.jar
  4. Copy the js-axiom-SNAPSHOT.jar, js.jar and all the above JARs into the {RhinoHome}/libdirectory. In {RhinoHome} run the following command.

    java -cp lib/js.jar:lib/js-axiom-SNAPSHOT.jar:lib/axiom-api-SNAPSHOT.jar:lib/
    axiom-impl-SNAPSHOT.jar:lib/commons-logging-1.0.4.jar:lib/stax-api-1.0.1.jar:lib/
    wstx-asl-3.0.0.jar org.mozilla.javascript.tools.shell.Main

Let's start with a simple XML that represents some information about WSO2 products. In order to manipulate this XML, you need to instantiate an XML object (which is a native object in ECMAScript added by E4X). There are two ways to instantiate a category XML object. Passing the XML as a string in the following manner is one way.

var s =  new XML(xmlString);

Now you can use this XML object to manipulate the data provided in the xmlString. As you can see, XML becomes one of the native types that JavaScript understands. The second way is to simply "in-line" XML in your Javascript code in the following manner.

var s = <wso2 category="products">
<product id="wsf">
<name>Web Services Framework</name>
<source>https://wso2.org/repos/wso2/wsas/java/</source>
</product>
<product id="wsas">
<name>Web Services Application Server</name>
<source>https://wso2.org/repos/wso2/wsas/java/</source>
</product>
</wso2>;

Variable "s" is an XML object. You can access any part of this XML using familiar JavaScript operators and some new operators introduced by E4X.

Accessing Child Properties of the XML Object

s.product;
s.@category;

This is the syntax for accessing properties (children) and attributes within the values of the XML and XMLList types. There are two product elements in the above XML. Therefore, the first code line returns an object which contains two XML objects representing the product elements. The result of the above code line is,

 

<product id="wsf">
<name>Web Services Framework</name>
<source>https://wso2.org/repos/wso2/wsas/java/</source>
</product>
<product id="wsas">
<name>Web Services Application Server</name>
<source>https://wso2.org/repos/wso2/wsas/java/</source>
</product>

The second one returns a value of the attribute name "category". Another way to access child properties is as follows;

var prd = "product";
s[prd];

Since the results of the s.products is an XMLList object, you can consider it as an array of XML objects. Therefore you can access the first product element from the list using the following line.

s.product[0];

These property access operators access only the top level children of an XML object. If you need to access descendant children, you need to use the 'double dot operator' which is the descendant property accessor.

s..source;

This searches for descendant children that matches the QName "source" and returns a XMLList. The result is,

<source>https://wso2.org/repos/wso2/wsas/java/</source>
<source>https://wso2.org/repos/wso2/wsas/java/</source>
s.product.(@id=="wsf").source;

Consider the above code line. The above code line returns the source URL of the product whose ID = "wsf". This is a very useful feature for XML processing applications.

https://wso2.org/repos/wso2/wsas/java/

Iterating Over a List of XML Objects

By using E4X, you can iterate over a list of XML objects in the following manner.

for each(var h in s..name){
print(h);
}

result

Web Services Framework
Web Services Application Server

These are the basics of E4X. I hope that you have now understood the flexibility given by E4X to manipulate XML.

Motive Behind Implementing E4X using Apache AXIOM

One of the most important reasons for implementing E4X in AXIOM is to make JavaScript Web services faster. Let's see how. The Apache Axis2/Java next generation Web services engine supports JavaScript Web services. In these Web services, the service implementation is completely written in JavaScript language. Since Rhino is a JavaScript interpreter and compiler, it can be embedded inside Apache Axis2/Java to provide the capability to deal with JavaScript.

As mentioned earlier, Apache AXIOM/Java is used as the underlying object model which facilitates XML processing in Apache Axis2/Java. Therefore, incoming SOAP requests and the responses are in the form of OMElements. OMElements represent XML elements in AXIOM.

In JavaScript Web services the service implementation is based on JavaScript, which means the service class is a *.js file with several functions. Most of the time the JavaScript functions which act as the Web service operations must take an argument that represents the payload of the incoming SOAP message. The payload contains the necessary data to consume the Web service. For example consider the following JavaScript function.

function getWeather(xmlParam) {
var xml = new XML(xmlParam);
......
return answer;
}

Since Rhino converts this JavaScript code into a representation in Java, it is possible to send Java objects as arguments to this function. The line var xml = new XML(xmlParm) instantiates a new XML object, and XML is one of the four native JavaScript objects added by E4X. Therefore, the type of the argument (String, OMElement ...) which can be passed to these JavaScript functions totally depends on the specific E4X implementation.

If it is implemented using XMLBeans, the argument cannot be an instance of the OMElement. If this is the case, the OMElement has to be serialized to a String. The serialization process results in extra overhead. This process has to be repeated in the return path also. Whatever the return object, it should first be serialized to a String and then to an OMElement. This process of serialization and deserialization reduces the performance of the JavaSript Web services deployed in Apache Axis2/Java by a considerable portion.

Reducing the unnecessary performance hit is one of the most important goals of implementing the E4X specification in AXIOM. Although this is the most important, there are other motives behind the E4X implementation.

  • AXIOM is designed to be lightweight, which makes this implementation less memory intensive.

  • AXIOM does not build the whole object at once, rather it builds it on demand. Due to this feature, AXIOM delivers great performance in certain applications.

These features make this implementation ideal for applications where you need to work with only part of the document model, applications where the memory is constrained, and with Web service engines (Apache Axis2). Also, this is the most suitable E4X implementation that is to be used with the JavaScript Web service deployed in Apache Axis2/Java.

Performance

Let's wrap up things with a quick look at the performance of the AXIOM based E4X implementation, used in JavaScript Web services in Apache Axis2/Java. This section compares the performance of an E4X implementation in AXIOM with an E4X implementation in XMLBeans that is used in Rhino, to support JavaScript Web services in Apache Axis2. For this purpose, two JavaScript Web services were deployed in an Apache Axis2/Java engine and a number of Web service invocations were made against the engine. Each of the JavaScript Web service contains four service operations. They are echoStrings, echoMeshInterface, getChildrenList, and getDescendantChildrenList. The first two of them just echo the input data without processing them, but the last two operations perform some processing on the input data. For each of these operations, the size of the request payload varies by increasing the number of elements from 1 to 100.

Test Framework

The following published software packages were used in this performance test.

  • Apache Axis2/Java 1.1.1

  • Rhino 1.6R5

  • Apache AXIOM/Java 1.2.2

  • Apache XMLbeans 2.2.0

The services were deployed in the same instance of Apache Tomcat (5.5.20) running on Sun's Java JDK 1.5.0_10-b03. The Java options used were:

JAVA_OPTS='-server -Xms2g -Xmx2g -Xss512k -XX:PermSize=512m -XX:MaxPermSize=512m -XX:+UseParallelGC
-XX:+UseParallelOldGC -XX:+AggressiveOpts -XX:+UseBiasedLocking'

The client machine was running on Microsoft Windows Server 2003 R2 Enterprise Edition (Service Pack 1). The service was running Fedora Core 5 on a 4-way Xeon 3.2 Ghz server with 2GB RAM. Apache Bench tool [1] was used as the client driver. This client driver makes HTTP POST commands using multiple threads. To simulate a high-load environment 50 client threads were used and each test consists of 1000 requests giving each thread 20 invocations.

See the resources section for the benchmark code [2].

Test Results

First, it should be mentioned that this test does not compare E4X/AXIOM with E4X/XMLBeans in simple XML processing. These results only illustrate whether implementing E4X in Rhino using AXIOM was more successful. The goal is to make JavaScript Web services in Apache Axis2 faster. The following test results shows that the performance of JavaScript Web services has improved by a considerable margin when E4X/AXIOM is plugged into Rhino.

Small Datasets.

Figure 1. Average number of requests per second for small datasets.

Figure 1 shows the average number of requests per second invoked on JavaScript Web services with the data size of just 1 element. The chart clearly shows that the performance is improved by nearly 23% for small datasets.

LargeDatasets.

Figure 2. Average number of requests per second for large datasets.

The large dataset contains an array size of 100 elements. Figure 2 shows the average number of requests per second invoked against the JavaScript Web services with large datasets. Nearly 30% performance improvement can be seen for large datasets.

Figure 3. Average number of requests per second invoked on getDescendentChildrenList test.

The final chart shows the results for the getDescendentChildrenList test for different array sizes from 1 to 100. This clearly shows that, for larger array sizes the E4X/AXIOM implementation performs faster than the E4X/XMLBeans implementation.

Conclusion

The above results shows a considerable improvement in performance of JavaScript Web services. Therefore it is clear that implementing E4X in AXIOM is successful. Finally, in order to support JavaScript Web services in Apache Axis2/Java, it is better to use Rhino with an E4X/AXIOM implementation for improved performance.

Resources

[1] ab - Apache HTTP Server Benchmarking Tool

[2] Performance Benchmark Code

[3] ECMAScript for XML (E4X) Specification

[4] Rhino - JavaScript for Java

[5] Apache AXIOM/Java

[6] Apache Axis2/Java

[7] Apache XMLBeans

Author

Sameera Madushan Jayasoma, Undergraduate, Department of Computer Science & Engineering, University of Moratuwa, Sri Lanka. Intern at WSO2 Inc. sameera at wso2 dot com

 

About Author

  • Sameera Jayasoma
  • Senior Director, Platform Architecture
  • WSO2.