Importance of Performance and how WSO2 ESB handles the Non-Obvious

  • By Srinath Perera
  • 19 Apr, 2012
Note: An updated article available- ESB Performance Round 6.5

What is performance?

Performance measures the amount of data, business processes, transformations, or rules processed within a specific timeframe. Server performance is gauged by measuring latency and throughput.

1. Latency measures the end-to-end time processing time. In a messaging environment, teams determine latency by measuring the time between sending a request and receiving the response. Latency is usually measured from the client endpoint.

2. Throughput measures the amount of messages that a server processes during a specific time interval (e.g. per second). Throughput is calculated by measuring the time taken to processes a set of messages and then using the following equation.

It is worth noting that these two values are often loosely related. A performance test team cannot directly derive one measurement from the other.

Figure 1: Performance Characteristic Graphs

Figure 1 shows performance characteristic graphs for a typical server. Typically, a server has an initial range where throughput increases at a roughly linear rate and latency either remains constant or linear. As concurrency increases, the approximately linear relationship decays, and system performance rapidly degrades. Performance tuning attempts to modify the relationship between concurrency and throughput and/or latency, and maintain a linear relationship for as long as possible.

For more details about latency and throughput, read the following online resources:


Unlike static server capacity measurements (e.g. CPU processing speed, memory size), performance is a dynamic measurement. Latency and throughput are strongly influenced by concurrency and work unit size. Larger work unit size usually negatively influence latency and throughput. Concurrency is the number of aggregate work units (e.g. message, business process, transformation, or rule) processed in parallel (e.g. per second). Higher concurrency values have a tendency to increase latency (wait time) and decrease throughput (units processed). To visualize server performance across the range of possible workloads, performance teams graph latency or throughput against concurrency or work unit size.

Teams commonly set performance thresholds to optimize system responsiveness and decrease server cost. Optimum system responsiveness is achieved when latency equals the workload transfer rate. For example, if ten cars enter a road segment at 100kmh velocity, the road segment would ideally transfer the cars at the same velocity. Ideally, the system would not introduce additional latency (lag). In reality, a server latency of less than 10ms is preferred to maximize throughput. Optimum server cost is achieved when throughput is less than or equal to maximum number of required work units. In networked systems or messaging environments, the ability to operate at wire-speed is desired.

Although, throughput and latency are used as central measurements, many other characteristics influence performance:

  • Memory footprint – how much memory would a server require? The maximum value indicates the lowest server density threshold.
  • Variation in latency – How stable is latency?
  • How does the server handle sudden spikes of requests?
  • How does the server handle slow clients? Slow clients may increase latency by not removing workloads as fast as the workloads are processed by the server.

Why performance matters?

Performance is a server capacity measurement that influences operational cost and scale. There is a word that reoccurs with performance: capacity planning. When an organization needs to setup a server, the organization commonly establishes a target workload amount. The organization will provision enough servers to process the target workload in a timely manner. The team may design the server environment as multiple low-performance servers or running a smaller number of high-performance servers.

As a rule of thumb, when organizations run fewer servers, the organization reduces cost and operational effort. Increasing the number of servers will increase the hardware cost, require additional administration personnel, and increase solution and deployment architecture complexity. When the server performance has improves, fewer servers are required, and total cost of ownership is decreased.

As most organizations cannot afford to reject a subset of business requests, teams calculate capacity-planning thresholds based on the projected maximum target workload. The first capacity planning step is to establish how many servers are required to meet a projected throughput within a given latency limit. Against an ideal uniform workload, the capacity planning calculation is straightforward. For example,

Here the celling function rounds the value up.

For the ideal case, the other performance aspects (i.e. memory, variability, client speed) do not impact the required number of nodes. However, real life workloads are not uniform, and teams commonly add resources to guard against variable load and environment conditions. Often, operational monitoring indicates that systems require a significant number of server resources (i.e. 3-5 times higher than ideal) to buffer against real-world performance degradation conditions. However, a stable operational environment enables teams to reduce the number of resources serving as a buffer. Environment stability and performance predictability are important solution characteristics.

How to improve ESB throughput and latency

Figure 2: Typical ESB Deployment

Figure 1 depicts a typical ESB deployment. The ESB component may perform mediation operations on the message, network protocol, and message route. Client request message passes through the ESB, and the ESB intercepts requests, mediates requests, and forwards the message to back end servers. After the back end server has processed the request, the server sends the message back to the ESB, and the ESB sends the response back to the client.

Almost all ESB products deliver good throughout and latency against steady state workload. However, when unavoidable load variations occur, significant performance differences materialize. Let us have a detailed look at the some of the less trivial ESB performance aspects:

1. Handling parallel message transport connections: When designing message servers, ESB architects must decide how to effectively process parallel requests. Most conventional servers like Tomcat use a model where the server allocates a thread per client request, and the number of concurrent, parallel connections is limited by the maximum number of threads in the thread pool. If we set the number of threads very high, the server starts to thrash due to context switching, and performance is degraded. Non-blocking transports (e.g. WSO2 ESB NIO transport) allow servers to process work with fewer threads and minimize threads blocking while performing I/O operations. When using a non-blocking transport, the server handles I/O operations using fewer threads, and other threads (which are called worker threads) work without blocking. Therefore, few threads (e.g. 20 threads in WSO2 ESB) can easily handle thousands of parallel connections.

2. Pass-through for header based mediation – ESB nodes often perform mediation tasks (e.g. filter, adjust, and reroute messages). Often, the ESB can perform mediation by solely inspecting the message headers, and therefore, do not need to inspect the message content. In situations where message body content may be opaque to the ESB, the ESB can avoid reading the message body and directly transfer message body content from the incoming input stream to the output stream. The same can be done for the response flow as well. WSO2 ESB server offers a special binary relay mode that supports this pass through functionality. More details about binary relay can be found from from and the article a ref=""> presents the impact of binary relay. The binary relay provides two types of gains.

a. Make the mediation faster by avoiding message parsing and memory copying

b. Minimize the memory footprint by avoiding loading message into memory.

3. Optimized message handling for message inspection use cases – Axis2, the underline SOAP runtime used by WSO2, supports lazy XML processing. That is the SOAP processor reads data from the input stream as late as possible and able to transfer the unprocessed data to the next processing step. Therefore, if a mediation logic needs to read only part of a large SOAP message, Axis2 will only build the in-memory object model only up to required point. The rest of the SOAP message is directly conveyed to the backend server at the STAX event level.

4. Handling slow clients – although WSO2 ESB NIO transports avoid threads from blocking on IO most of the time, with a slow client (reading or writing to the server), work threads could block.

Figure 3: Slow Client Scenario

To understand how a slow client impacts performance, we have to understand how a network works. When writer writes to a network connection, the content is transferred into a network buffer at the writer’s end, and then the network layer will transfer that data to the reader’s network buffer, and then client reader will read the data from the reader’s network buffer. A client connected to a slow network will read data slowly from the network buffers, and the writer is forced to wait a longer time for available writer output buffer space. The writer will block until the network buffer is cleaned up. The same thing can happen when server reads from the network connection and the client is behind a slow network.

If the server encounters many simultaneous slow clients, the scenario can block multiple worker threads, and slowdown the ESB. New NIO transport introduced in the article improves the current WSO2 ESB NIO transport by completely avoiding network blocking.


The article introduces the basic concepts behind performance and explains the role of throughput and latency while understanding the behavior of a server. However, there are many other parameters that affect the server’s behavior when it is handling non-uniform loads. The article discusses several of such parameters and discusses some of the design decisions in ESB architecture that avoid or minimize the affect of those aspects on the server performance.