What is performance?
Performance measures the amount
of data, business processes, transformations, or rules processed within a
specific timeframe. Server performance is gauged by measuring latency and
1. Latency measures the end-to-end time
processing time. In a messaging
environment, teams determine latency by measuring the time between sending a
request and receiving the response. Latency is usually measured from the client
2. Throughput measures the amount of
messages that a server processes during a specific time interval (e.g. per
second). Throughput is calculated by measuring the time taken to processes a
set of messages and then using the following equation.
It is worth noting that these
two values are often loosely related. A performance test team cannot directly
derive one measurement from the other.
Figure 1: Performance Characteristic
Figure 1 shows performance
characteristic graphs for a typical server. Typically, a server has an initial
range where throughput increases at a roughly linear rate and latency either
remains constant or linear. As
concurrency increases, the approximately linear relationship decays, and system
performance rapidly degrades.
Performance tuning attempts to modify the relationship between
concurrency and throughput and/or latency, and maintain a linear relationship
for as long as possible.
For more details about latency
and throughput, read the following online resources:
Unlike static server capacity
measurements (e.g. CPU processing speed, memory size), performance is a dynamic
measurement. Latency and throughput
are strongly influenced by concurrency and work unit size. Larger work unit size usually negatively
influence latency and throughput.
Concurrency is the number of aggregate work units (e.g. message,
business process, transformation, or rule) processed in parallel (e.g. per
second). Higher concurrency
values have a tendency to increase latency (wait time) and decrease throughput
(units processed). To visualize
server performance across the range of possible workloads, performance teams graph
latency or throughput against concurrency or work unit size.
Teams commonly set performance
thresholds to optimize system responsiveness and decrease server cost. Optimum system responsiveness is
achieved when latency equals the workload transfer rate. For example, if ten cars enter a road
segment at 100kmh velocity, the road segment would ideally transfer the cars at
the same velocity. Ideally, the system would not introduce additional latency
(lag). In reality, a server latency
of less than 10ms is preferred to maximize throughput. Optimum server cost is achieved when
throughput is less than or equal to maximum number of required work units. In networked systems or messaging
environments, the ability to operate at wire-speed is desired.
Although, throughput and
latency are used as central measurements, many other characteristics influence
- Memory footprint – how much
memory would a server require? The maximum value indicates the lowest server
- Variation in latency – How
stable is latency?
- How does the server handle sudden
spikes of requests?
- How does the server handle slow
clients? Slow clients may increase
latency by not removing workloads as fast as the workloads
are processed by the server.
Why performance matters?
Performance is a server
capacity measurement that influences operational cost and scale. There is a word that reoccurs with
performance: capacity planning. When an organization needs to setup a server,
the organization commonly establishes a target workload amount. The organization will provision enough
servers to process the target workload in a timely manner. The team may design the server
environment as multiple low-performance servers or running a smaller number of
As a rule of thumb, when
organizations run fewer servers, the organization reduces cost and operational
effort. Increasing the number
of servers will increase the hardware cost, require additional administration
personnel, and increase solution and deployment architecture complexity. When the server performance has
improves, fewer servers are required, and total cost of ownership is decreased.
As most organizations cannot
afford to reject a subset of business requests, teams calculate capacity-planning
thresholds based on the projected maximum target workload. The first capacity
planning step is to establish how many servers are required to meet a projected
throughput within a given latency limit.
Against an ideal uniform workload, the capacity planning calculation is
straightforward. For example,
Here the celling function
rounds the value up.
For the ideal case, the other
performance aspects (i.e. memory, variability, client speed) do not impact the
required number of nodes. However, real life workloads are not uniform, and
teams commonly add resources to guard against variable load and environment
conditions. Often, operational monitoring indicates that systems require a
significant number of server resources (i.e. 3-5 times higher than ideal) to
buffer against real-world performance degradation conditions. However, a stable
operational environment enables teams to reduce the number of resources serving
as a buffer. Environment stability
and performance predictability are important solution characteristics.
How to improve ESB throughput and latency
Figure 2: Typical ESB Deployment
Figure 1 depicts a typical ESB
deployment. The ESB component may perform mediation operations on the message,
network protocol, and message route. Client request message passes through the
ESB, and the ESB intercepts requests, mediates requests, and forwards the
message to back end servers. After the back end server has processed the
request, the server sends the message back to the ESB, and the ESB sends the response
back to the client.
Almost all ESB products deliver
good throughout and latency against steady state workload. However, when
unavoidable load variations occur, significant performance differences
materialize. Let us have a detailed look at the some of the less trivial ESB
1. Handling parallel message transport
connections: When designing message servers, ESB architects must decide how to
effectively process parallel requests. Most conventional servers like Tomcat
use a model where the server allocates a thread per client request, and the number of concurrent, parallel connections is limited by the
maximum number of threads in the thread pool. If we set the number of
threads very high, the server starts to thrash due to context switching, and
performance is degraded. Non-blocking transports (e.g. WSO2 ESB NIO transport)
allow servers to process work with fewer threads and minimize threads blocking
while performing I/O operations. When using a non-blocking transport, the
server handles I/O operations using fewer threads, and other threads (which are
called worker threads) work without blocking. Therefore, few threads (e.g. 20
threads in WSO2 ESB) can easily handle thousands of parallel connections.
2. Pass-through for header based mediation
– ESB nodes often perform mediation tasks (e.g. filter, adjust, and
reroute messages). Often, the ESB can perform mediation by solely inspecting
the message headers, and therefore, do not need to inspect the message content.
In situations where message body content may be opaque to the ESB, the ESB can
avoid reading the message body and directly transfer message body content from
the incoming input stream to the output stream. The same can be done for the
response flow as well. WSO2 ESB server offers a special binary relay mode that
supports this pass through functionality. More details about binary relay can be found
from from https://wso2.org/library/articles/binary-relay-efficient-way-pass-both-xml-non-xml-content-through-apache-synapse and the article
a ref="https://wso2.org/library/articles/2012/03/performance-benchmarking-wso2-esb-different-message-transfer-mechanisms">https://wso2.org/library/articles/2012/03/performance-benchmarking-wso2-esb-different-message-transfer-mechanisms presents the impact of binary relay. The binary relay provides two types of gains.
a. Make the mediation faster by avoiding
message parsing and memory copying
b. Minimize the memory footprint by
avoiding loading message into memory.
3. Optimized message handling for
message inspection use cases – Axis2, the underline SOAP runtime used by
WSO2, supports lazy XML processing. That is the SOAP processor reads data from
the input stream as late as possible and able to transfer the unprocessed data
to the next processing step. Therefore, if a mediation
logic needs to read only part of a large SOAP message, Axis2 will only build
the in-memory object model only up to required point. The rest of the SOAP
message is directly conveyed to the backend server at the STAX event level.
4. Handling slow clients –
although WSO2 ESB NIO transports avoid threads from blocking on IO most of the
time, with a slow client (reading or writing to the server), work threads could
Figure 3: Slow Client Scenario
To understand how a slow client impacts performance, we have to
understand how a network works. When writer writes to a network connection, the
content is transferred into a network buffer at the writer’s end, and then the
network layer will transfer that data to the reader’s network buffer, and then
client reader will read the data from the reader’s network buffer. A client
connected to a slow network will read data slowly from the network buffers, and
the writer is forced to wait a longer time for available writer output buffer
space. The writer will block until
the network buffer is cleaned up. The same thing can happen when server reads
from the network connection and the client is behind a slow network.
If the server encounters many simultaneous slow clients, the
scenario can block multiple worker threads, and slowdown the ESB. New NIO
transport introduced in the article https://wso2.org/library/articles/2012/03/performance-benchmarking-wso2-esb-different-message-transfer-mechanisms improves the current WSO2 ESB NIO
transport by completely avoiding network blocking.
The article introduces the
basic concepts behind performance and explains the role of throughput and
latency while understanding the behavior of a server. However, there are many
other parameters that affect the server’s behavior when it is handling
non-uniform loads. The article discusses several of such parameters and
discusses some of the design decisions in ESB architecture that avoid or minimize
the affect of those aspects on the server performance.