January 06, 2019

Measuring Software Scalability using Universal Scalability Law

Introduction

We discussed the basic concepts of the Universal Scalability Law (USL) and a practical use case of USL for hardware scalability testing in our previous article. We can measure the scalability of a system in two different ways. First, as we have already explored in the previous article, we are interested in how the system scales when we add more hardware. Second, we are interested in how many concurrent users our application can support. In the latter scenario, we are interested in finding the scalability characteristics of the application under different concurrency levels, given a fixed hardware configuration.

To elaborate more on the second approach, consider a simple client-server web application which is run on a particular machine (hardware is fixed). Let’s assume that when you test the application with 100 concurrent users, you get a 1000 transactions per second maximum throughput. When you increase the concurrency level to 200, ideally you should get a throughput of 2000 transactions per second. But the reality is that you will find that the maximum throughput at a 200 concurrency level is slightly less than 2000 transactions per second. When you further increase the level of concurrency, you will realize that throughput starts to display retrograde behavior. So how can we model this behavior in a formal manner?

One way to address this problem is through exhaustive capacity planning. That is, you measure the throughput of the system while increasing the level of concurrency until you find the concurrency level which shows retrograde throughput behavior. This may sound like a useful method, but in reality, this requires a larger budget and a substantial amount of time. As such, how can we achieve our target of modeling system behavior under different concurrency levels in a more efficient and cost-effective manner? This can be achieved with the Universal Scalability Law (USL).

In the following paragraphs, I will present how USL for software is derived by extending Amdahl’s law. I will then explain how USL for software differs from USL for hardware. Furthermore, a real-world use case using USL and a discussion of its results and implications are included.

Amdahl’s Law for Software Scalability

Amdahl’s Law accounts for the reduction of speed up due to the fraction of a program that runs sequentially. In his book [1], Hennessy has provided a more elaborative form of Amdahl’s law as follows:

Equation 1

Fractionenhanced accounts for the portion of run time we are interested in reducing. Speedupenhanced is the inverse of the fractional time reduction.

If we denote fraction enhanced by π and speed up enhanced (fractional time reduction) by φ, we can write Equation 1 as follows:

Equation 2

Let σ = 1 - π, where σ is the serial fraction of the workload.

Assume that π (fraction enhanced) can be divided into N parts. Then φ becomes 1/N.

Then we can write Equation 2 as (3) which is further simplified in (4).

Equation 3

Equation 4

Equation (4) looks quite familiar and indeed it should. That is the equation of Amdahl’s Law for software. It is identical to the equation of Amdahl’s Law for hardware scalability we presented in our previous article.

Using this formula, we can derive the USL equation for software scalability by introducing the impact of interprocess communication among different concurrent users.

Universal Scalability Law for Software Scalability

Neil Gunther has provided a formal equation for software scalability as in (5) [2]. The same argument used in hardware scalability applies here - when there are N number of concurrent users running in parallel, at most there will be N(N-1) number of interactions taking place between each user process. To capture this behavior, we introduce a new parameter β, which is called coherency. Note that σ and k in USL equation are replaced by α and β just to avoid any confusion with the USL equation for hardware.

Equation 5

In hardware scalability, we defined the independent variable p as the number of hardware processors or nodes. But here, N -the independent variable - stands for the number of concurrent users (software processes) or load generators.

Being a rational function, equation (5) can be differentiated with respect to N. Differentiating equation (5), with respect to N, and equating the result to 0, we get the value of N at which Csw(N) is maximum (since equation 5 has a concave shape). We denote this value of N by N*. Then the maximum value of Csw(N) is Csw(N*).

Equation 6

USL for software scalability is identical to that of hardware scalability, yet the underlying assumptions are different. In hardware scalability testing, we rely on the underlying assumption that the number of software processes (N), executing in each hardware processor is fixed throughout the experimentation - that is, N/p remains fixed, where p is the number of hardware processors. In software scalability tests, scalability is measured as a function of the user load N. We rely on the assumption that the underlying hardware platform is fixed for all measured points of N.

To summarize, USL is a rational function of 3 parameters - the level of concurrency (N), contention (α) which accounts for the serial fraction of the workload, and coherency (β) which is the penalty for interprocess communication.

Let us now explore the practical uses of USL for capacity planning in detail.

How to Measure Throughput?

When testing for software scalability, you need to get the system throughput for different concurrency levels. We use load testing tools like JMeter to simulate the concurrent users. I will provide a brief explanation on how you can use JMeter for software performance testing in this section. If you are looking for more details, here is an end to end tutorial on how to perform JMeter tests.

Figure 1 shows the basic experimental setup for performance testing. At the specified level of concurrency, the JMeter client sends the same request to the endpoint it is configured to (in this case, the address of the server). For an instance, if we specify the level of concurrency to be 100, at a given time, JMeter spawns 100 parallel threads and sends requests to the Server. Upon receiving a request, the server processes the request and sends the response back to the JMeter client. Upon receiving the response from the server, each JMeter thread sends the next request (with 0 sleep time). If we assume that the JMeter client has enough hardware resources to handle the given concurrency level, then it is the server that determines the number of requests it can process at a given time. Hence, by collecting the JTL file (saved in the JMeter client), we can determine the number of requests that are processed in a given second. This is the throughput value we need.

We will now experiment on the performance data of WSO2 Enterprise Integrator[4]. We have run all these performance tests using Amazon EC2 while adhering to industry best practices, to make sure the repeatability of tests.

WSO2 Enterprise Integrator Dataset

WSO2 Enterprise Integrator is a fast, lightweight, and 100% open source product distributed under the Apache Software License v2.0. WSO2 Enterprise Integrator allows system administrators and developers to conveniently configure message routing, mediation, transformation, logging, task scheduling, failover routing, load balancing, and more. This article features a basic use case of WSO2 Enterprise Integrator, which is Direct Proxy or Simple Pass-Through Proxy.

Since WSO2 Enterprise Integrator mediates requests between a client and a server, in addition to the configuration specified in Figure 1, we need a backend server for performance tests using this product. We use a simple Netty Echo service as the backend. You will also notice that we are using 3 JMeter instances here, unlike the configuration mentioned in Figure 1. JMeter client is used to distribute the load among two other JMeter servers equally. This is to ensure that our JMeter nodes do not run out of resources when running in very high concurrency levels (usually more than 1000). For instance, if a JMeter client wants to send requests at a 1000 concurrency level, both JMeter Server 1 and 2 will send requests at a 500 concurrency level, thus the aggregated concurrency level is 1000. Figure 2 below illustrates the setup we use when testing EI.

Simple Pass-Through Proxy (Direct Proxy) forwards messages to the endpoint specified (in this case Netty HTTP Echo service), without performing any processing on them. This proxy service is useful as a catch-all so that the messages that do not meet the criteria to be handled by other proxy services are simply forwarded to the endpoint.

Figure 2 (source: https://github.com/ThishaniLucas/performance-ei/tree/perf-test)

We use the WSO2 Enterprise Integrator 6.4.0 for these experiments. In this dataset, we test for 3 different message sizes - 500B, 1KB, and 10KB. Furthermore, for each message size, we experiment on 4 different concurrency levels - 100, 200, 500, and 1000. For each message size and concurrency level configuration, we run the test in c5.xlarge Amazon EC2 instance for 15 minutes. We fix the heap size to 4GB and backend service delay to 0 seconds. Since the scalability characteristics of a system are stated in the steady state, we remove the first 5 minutes results from the JTL files and consider only the last 10 minutes results. Table 1 below depicts these performance results.

Message Size (KB) Concurrency (N) Throughput (requests per second) Average Latency (ms)
500B 100 17588.2 5.63
500B 200 19509.07 10.17
500B 500 19940.62 24.92
500B 1000 19764.01 50.5
1KB 100 16667.76 5.93
1KB 200 18175.66 10.87
1KB 500 18235.69 27.23
1KB 1000 18255.44 54.69
100KB 100 13173.19 7.5
100KB 200 13937.52 14.22
100KB 500 13434.7 37.1
100KB 1000 13203.39 75.63

Table 1

It is possible to manually compute the coefficients of α and β using regression.However, this article only focuses on how we can , use R USL package to compute the universal scalability law parameters.

Figure 3 below shows the code to compute USL parameters for the above dataset.

Figure 3: R USL code

Table 2 below summarizes the USL parameters for this dataset. Figure 4 depicts the USL curves for three message sizes.

Message Size α β Max Users (N*) Max Throughput (requests/second)
500B 7.306e-02 6.554e-07 1189 19921.33
1KB 7.372e-02 2.413e-06 620 18343.21
10KB 7.494e-02 9.883e-06 306 13852.88

Table 2

Figure 4: USL curves

As the message size increases, we can observe that the throughput drops significantly. As the message size increases from 500B to 1KB, contention (α) significantly increases from 7.306e-02 to 7.494e-02, and the coherency (β) increases from 6.554e-07 to 9.883e-06. Consequently, the concurrency level which starts to display the retrograde behavior decreases from 1189 at 500B message size to 306 at 1KB message size.

Using α and β, it is possible to identify the factors that hinder the performance of a software system. If you observe a high α value, then you should modify your software to minimize serialization (make things run in parallel). If you find that β is very high, it reflects that your software system has many dependencies between different threads (in most server implementations, there is a worker thread pool from which different threads are assigned to process concurrent requests. Interaction of these threads result in high β values). In that case, you should focus on ways to minimize thread inter-dependencies.

How to Use USL in the Software Development Process

When you develop an application software system, it is vital to focus on the scalability characteristics. USL provides a more systematic approach to handle this problem and this is a step-by-step approach that we can follow:

  1. Know the characteristics of the workload your system will get, for example message sizes.
  2. Have an idea as to how many maximum concurrent users your system should support and the maximum throughput.
  3. Get several throughput measurements against different concurrency levels (this should not be an exhaustive search).
  4. Apply USL.
  5. The maximum concurrency level and the maximum throughput predicted by USL at step 4 should be higher than your throughput value determined in step 2.
  6. If predicted maximum throughput by USL is less than your intended throughput at step 2, you should either modify the hardware and provide more resources and start over from step 3; or change your software so as to reduce α and β and re-start from step 3.
  7. Whenever modifications are done to your software system, you should follow the above steps to ensure that new software features do not affect the scalability characteristics of your system.

Summary

This article explored USL for software and its practical uses. It began with a discussion on how USL is derived by extending Amdahl’s Law, followed by how we can compute USL parameters using the R language library usl. Finally, we briefly discussed the scalability characteristics of WSO2 Enterprise Integrator using USL.

References

[1] Hennessy, J. L. and Patterson, D. A. (1996). Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco, CA, 2nd. Edition.

[2] Gunther, N. (2007). Guerrilla Capacity Planning. Springer, Berlin Heidelberg. https://doi.org/10.1007/978-3-540-31010-5

[3] Tennage (2018). How to automate JMeter Performance Tests. [online] Slideshare.net. Available at: https://www.slideshare.net/PasinduTennage/how-to-automate-jmeter- performance-tests [Accessed 7 Nov. 2018].

[4]Docs.wso2.com. (2018). WSO2 Enterprise Integrator Documentation - Enterprise Integrator 6.1.1 - WSO2 Documentation. [online] Available at: https://docs.wso2.com/display/EI611/WSO2+Enterprise+Integrator+Documentation [Accessed 7 Nov. 2018].