November 12, 2018

Scalability has been defined as the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth [1]. In a general sense, scaling is a geometric notion where we make a system bigger by stretching it in several directions. However, every system has its limitations to growth. As the system is scaled up, overheads start to hinder the gains. Beyond a critical point, the capability of the system starts to decrease as the system is further scaled up.

Computing capability of a system can be scaled up. Up to a certain point, we observe a performance gain as the system is scaled up. Yet, after a critical point, we observe that we do not get the intended speed up as we provide the system with more and more resources. For example, assume a web server which works at a speed of 1000 transactions per second, when run using a single computing node. As we increase the node count to 2, we assume a throughput of 2000 transactions per second. Yet, in reality, we find that the performance of a two-node configuration is slightly less than 2000 transactions per second. This situation is worsened when the system is further scaled up. Beyond a particular threshold, the system displays retrograde speed up, as the system is further scaled up. Figure 1 shows how the system deviates from the ideal linear speed up.

Figure 1: Ideal linear speed up vs actual speed up

Hence, we cannot blindly scale up a system without a thorough analysis of its implications. A practical problem arises here: how are we going to know how our system will behave when we scale it up, without actually implementing that? In order to address this problem, we need a systematic way of analyzing the system, which we can use to predict the system’s behavior when we scale it up. Universal scalability law is the answer!

Universal scalability law (USL) was first proposed by N Gunther in [2]. It is a black box approach to performance modeling. Since then, it has been widely adopted by the research community as well as the industry.

So what makes it universal?

It is universal because it can be applied to any hardware, software, and different CPU architectures. The parametric model proposed here can model and predict the system behavior, irrespective of the underlying architecture.

This article will first present the USL formula as an extension to the widely known Amdahl's law. Then we present how we can use R package USL [3]. And finally, we present a concrete example of the application of USL for hardware.

Scale up and speed up are two widely used terms in performance analysis. Speed up is the reduction of execution time due to running a fixed workload, (for instance a sorting algorithm with a fixed number of input items), using an increased number of hardware processors. The motivation here is to execute a fixed workload in less amount of time. We will denote the speed up by S_{(p)}, where p is the number of processors. Scale up aims at maintaining the same execution time per user, while increasing the workload in proportion to the number of processors allocated. We will denote the scaled up capacity by C_{(p)}.

Linear speedup assumes that a particular workload which consumed T seconds to complete when run using a single processor will consume only T/p seconds to complete when executed using p parallel processors. As we have shown in figure 1, this is not observed in practical scenarios. Amdahl’s law states why we don’t observe linear speed.

Gene Amdahl shows that there is a fraction of the workload which cannot be parallelized, called serial fraction i.e. no matter how many cores are allocated for the program, only a single core will be executing the serial portion of the program. We use σ to denote this serial fraction of workload. Hence (1-σ) becomes the parallel fraction of workload. σ takes values in the range (0, 1).

If T_{1} is the time spent to execute the workload using 1 processor, we can derive a formula for the time T_{p} it takes to execute the same workload using p processors.

(Equation 1)

And S_{(p)} = T_{1}/T_{p}

Hence,

(Equation 2)

Equation 2 is called the Amdahl's Law.

Figure 2 shows how seriality (σ) affects the speed up using an example, where we chose σ = 0.2. As the number of nodes is increased, speed up converges to 5. This is because as p tends to infinity, S_{(p)} converges to 5 according to equation 2.

Figure 2: Amdahl’s Law speed up

NOTE: Depending on the context, p can have two definitions: number of hardware processors and number of nodes. p denotes number of hardware processors when we assume that other resources such as memory, I/O do not become bottlenecks when we increase the number of hardware processors. That is you have enough number of other resources (memory, I/O bandwidth), when you increase the number of processors. But in reality, when we increase the number of processors to a very high value, this assumption does not hold, as several other resources start to become bottlenecks. There we define p as number of computing nodes.

Efficiency is defined as the ratio between speed up and the number of processors:

(Equation 3)

If you substitute the expression for S_{(p)} from equation 2 in equation 3, and then differentiate equation 3 with respect to p, we get σ as the result. Hence it is clear that the rate of change of efficiency is equal to σ. Therefore seriality can be viewed as a direct measure of the change in efficiency [2].

The scaled up capacity C_{(p)} is the ratio between the throughput achieved using p processors Xp to throughput achieved using 1 processor X1.

(Equation 4)

It can be shown that C_{(p)} has the same form as equation 2. That is,

(Equation 5)

We will not cover the formal proof of equation 5 in this article. An interested reader can find a simple analytical proof of equation 5 in [2].

Therefore, is it only σ that causes performance bottlenecks when scaling up a system? According to equation 2, when the number of processors increases, speed up approaches an asymptote 1/σ. But in practice, we observe retrograde speed up, when the number of nodes is further increased. Amdahl's law does not capture this retrograde speed up behavior. This is where Universal scalability law comes into action.

Universal Scalability Law (USL) is an extension of Amdahl’s law. It accounts for the additional overhead due to interprocess communication. Interprocess communication happens at multiple levels within the system: application software level, middleware level, operating system level, and hardware level.

This can be explained using a typical database application, where multiple server processes need to communicate with a single database. Even though each process can progress without explicit communication with other processes, when reading elements from the database, they should explicitly communicate with other processes when updating the database in order to maintain the ACID properties of transactions. If there are p processors running in parallel, then each processor should communicate with p-1 number of other processors. Hence on average, p(p-1) number of interactions take place.

USL introduces this delay due to interprocess communication using a parameter k (kappa). Hence an additional term is added to the denominator of equation 5.

(Equation 6)

There we go! Equation 6 is the Universal Scalability Law. As you can notice, when parameter k equals 0, this is the same as Amdahl's law.

In summary, the USL is a rational function of 3 parameters; the number of nodes (p), contention (σ) which accounts for the serial fraction of the workload and coherency (k) which is the penalty for interprocess communication.

Being a rational function, we can differentiate C_{(p)} with respect to p and find the value of p that maximizes C_{(p)}. Remember that if F(x) is a differentiable function of x, the value of x at which F’(x), which is the first derivative of F(x), is zero, corresponds to the x value which maximizes or minimizes F(x).

So by differentiating C_{(p)} with respect to p and equating the result to 0, we get the p value, at which C_{(p)} has its maximum value. We will denote this value of p by p*. Hence, the maximum value of C_{(p)} is C_{(p*)}.

(Equation 7)

Figure 3 shows a sample USL curve with σ = 0.2 and k = 0.1. Note how the USL curve deviates from Amdahl’s law. Also, note how speedup starts to display retrograde speed up after p*.

Figure 3: USL vs Amdahl's Law speed up

So what is the actual use of modeling maximum C_{(p)}? How can we use that in real life capacity planning? Well, that is quite intuitive right? Once you reach the maximum capacity, no matter how many new resources you apply, you are not going to get any performance gain. In fact, your performance values are going to degrade. So scaling up a system beyond this point is a huge waste. Thanks to USL, you can find the optimal number of nodes, without actually testing on real hardware for that much of nodes. If you don’t use USL, you will have to test your system with many number of nodes, until you observe the retrograde speed up. But with USL, you only have to have an initial set of performance values that will calculate the saturation point.

So is that the only use of USL? Not really. USL provides you insight as to what hinders the performance of the system. Using the values of σ and k, you can understand why your system does not perform as you expect. High σ implies that your system has segments that run sequentially. Then you should focus on how you can parallelize these segments. On the other hand, if you find a k value that is very high, this implies that your system has more dependencies among the processes. Then you should look into ways that will minimize the interactions between processes.

Now you might have a problem. What if we use more than two parameters? Will there be additional information about the system that will be reflected? Neil Gunther [2] argues that more than two parameters are superfluous in a scalability model based on rational functions. A third parameter which will be associated with a cubic term will not have any significant purchase over a two-parameter model.

Finally, is that the only model we can use for scalability analysis? Well, there are many other models that are published. Some of these include:

- Quadratic Model
- Exponential Model
- Geometric Model

We will not go into the details of these models since they are beyond the scope of this article.

We will now focus on how we can practically apply USL for real performance data.

Dataset

We use the data set presented in [4], which is based on the experiments run in multi-node machine. Table 1 summarizes the performance data.

Number of Nodes (p) | Throughput (Request Per Second) (Xp) |

1 | 955.16 (X1) |

2 | 1878 |

3 | 2688 |

4 | 3548 |

5 | 4315 |

6 | 5130 |

7 | 5931 |

8 | 6531 |

9 | 7219 |

10 | 7867 |

11 | 8278 |

12 | 8646 |

13 | 9047 |

14 | 9426 |

15 | 9645 |

16 | 9897 |

17 | 10097 |

18 | 10240 |

19 | 10532 |

20 | 10798 |

21 | 11151 |

22 | 11518 |

23 | 11806 |

24 | 12089 |

25 | 12075 |

26 | 12177 |

27 | 12211 |

28 | 12158 |

29 | 12155 |

30 | 12118 |

31 | 12140 |

32 | 12074 |

Table 1: Performance data

If you carefully analyze the data, you will notice that when the number of nodes (physical computer machines) is increased, throughput increases up to 12211 at 27 nodes (p), and then starts to display a retrograde behavior. You may assume the value of p* to be 27 in this case. But remember, C_{(p)} is a continuous function of p, not a discrete one. Hence the value of p* can be slightly different than 27.

One major assumption to note when getting different throughput values against different node counts is that the ratio N/p is held constant across all node configurations, where N is the number of software processes and p is the number of nodes. Each additional node should be able to execute another N number of additional processes.

Next, we will focus on how we can compute the two parameters of USL using these data. Gunther [2] provides a more analytical approach to this task, by modifying the USL equation to get a polynomial of degree two, and then using non-linear regression to calculate the coefficients. We will not discuss the steps of non-linear regression in this article but will explore how you can use R language to compute the coefficients of USL.

R has a built-in package named usl, which provides functionality to compute the parameters of USL. Moreover, it computes additional useful parameters such as efficiency, residual standard error, multiple R-squared errors, maximum capacity C_{(p*)}, and corresponding number of nodes (p*). Following is the R script for the dataset mentioned in table 1.

Figure 4 depicts the USL fit of the dataset.

Figure 4: USL fit of data in table 1

Following is the summary report generated by this script.

Figure 5: Summary report

Wait, didn’t we say earlier that p* should be close to 27? But the USL has predicted something different. According to above results, peak C_{(p)} occurs when p equals 35 nodes. So what is wrong here?

The behavior we observe at 27 is a false alarm. Due to some changes in the system, it shows some retrograde speed up at 27 nodes, but this is actually not the point where the system shows diminishing returns. This is the real value of USL. Without USL, we would have thought that we can’t scale this system after 27 nodes.

In practice, we need at least half a dozen data points to calculate USL parameters. Higher the number of data points we have, higher the accuracy of the results.

Is it only hardware scalability that can be modeled using USL? Not really. As the name suggests, USL can be applied to model the scalability model of both hardware and software.In this article, we discussed the universal scalability law and some practical uses of it. We started by discussing the terminology and the background of USL. Then we showed how USL is derived by extending Amdahl’s law. Finally we discussed how to compute USL parameters using the R language library USL. Our next article will discuss how USL can be applied to model software scalability.

[1] En.wikipedia.org. (2018). Scalability. [online] Available at: https://en.wikipedia.org/wiki/Scalability [Accessed 30 Oct. 2018].

[2] Gunther, N. (2007). Guerrilla Capacity Planning. Springer, Berlin Heidelberg. https://doi.org/10.1007/978-3-540-31010-5

[3] Möding, S. (2014). Analyze system scalability in R with the Universal Scalability Law, 1–13.

[4]Schwartz, B. (2015). Universal Scalability Law, 1–52.