“Scale up early, scale down slowly” - Auto-scaling using WSO2 Elastic Load Balancer

  • By Nirmal Fernando
  • 27 May, 2013

Applies to

WSO2 Elastic Load Balancer 2.0.3

Content

  1. Introduction
  2. What is auto-scaling?
  3. What is the basis for auto-scaling?
  4. How do you calculate the number of requests in-flight?
  5. What are the decision making functions?
  6. Example scenario to demonstrate the auto-scaling algorithm
  7. Conclusion

Introduction

In a distributed system, the ability to expand or contract its resource pool is defined as scalability. A system can be scaled in two modes, horizontal and vertical. What we are interested in is horizontal scaling which is adding more nodes to a clustered distributed system.

In this article, you will learn the auto-scaling algorithm used in the WSO2 Elastic Load Balancer, some tips you should keep in mind when calibrating auto-scaling decision making variables, and a brief explanation of a sample scenario.

What is auto-scaling?

When requests to an application suddenly peaks, we should ideally increase the amount of resources provided for that application. The solution is auto-scaling. In an auto-scaling enabled system, the system itself detects such peaks and starts-up new server instances, to cater to the requirements without any manual interception.

With the evolution of cloud computing, today we can easily start new instances and terminate already existing instances at any given moment, that makes auto-scaling a possibility in a cloud environment.

What is the basis for auto-scaling?

Current default implementation (ServiceRequestsInFlightAutoscaler) considers a number of requests in-flight for a particular service cluster as the basis for making auto-scaling decisions. We follow the paradigm; “scale up early and scale down slowly” in the default algorithm.
What are the decision making variables?

There are few of them and all of the vital ones are configurable using loadbalancer.conf file of WSO2 Elastic Load Balancer.

  • autoscaler_task_interval (t) - time period between two iterations of ‘auto-scaling decision making’ task. When configuring this value, you are advised to consider the time ‘that a service instance takes to join ELB’. This is in milliseconds and the default value is 30000ms.
  • max_requests_per_second (Rps) - number of requests, a service instance can withstand per second. It is recommended that you calibrate this value for each service instance and also for different scenarios. The ideal way to estimate this value could be by load testing a similar service instance. Default value is 100.
  • rounds_to_average (r) - an auto-scaling decision will be made only after averaging the requests in-flight over this many iterations of ‘auto-scaling decision making’ task. Default value is 10.
  • alarming_upper_rate (AUR)- without waiting for the service instance to reach its maximum request capacity (alarming_upper_rate = 1), we scale the system up when it reaches the request capacity corresponding to alarming_upper_rate. This value should be 0
  • alarming_lower_rate (ALR) - lower bound of the alarming rate, which gives us a hint that we can think of scaling down the system. This value should be 0
  • scale_down_factor (SDF) - this factor is needed in order to make the scaling down process slow. We need to scale down slowly to reduce scaling down due to a false-positive event. This value should be 0

How do you calculate the number of requests in-flight?

We keep track of the requests that come to the WSO2 Elastic Load Balancer for various service clusters. For each incoming request, we add a token against the relevant service cluster and when the message leaves the Load Balancer or expires, we remove the corresponding token.

What are the decision making functions?

We always respect the minimum and maximum number of instances value of service clusters. We make sure that the system always maintains the minimum number of service instance requirement and also that the system will not scale beyond its limit.

We calculate,

average requests in-flight for a particular service cluster (avg) = total number of requests in-flight * (1/r)

Scaling up....

number of maximum requests that a service instance can withstand over an autoscaler task interval (maxRpt) = (Rps) * (t/1000) * (AUR) 

then, we decide to scale up, if,

avg > maxRpt * (number of running instances of this service cluster)

Scaling down....

imaginary lower bound value (minRpt) = (Rps) * (t/1000) * (ALR) * (SDF) 

then, we decide to scale down, if,

avg < minRpt * (number of running instances of this service cluster - 1)

Example scenario to demonstrate the auto-scaling algorithm

Task iteration 1 2 3 4 5 6 7 8 9
Requests in-flight 10 1 250 190 350 400 160 15 0

Since, rounds_to_average value is 2, let’s use a request frame of two columns.

Iteration 1:

10
Vector is not full → we cannot take a scaling decision.

Iteration 2:

10 1
Vector is full → we can take a scaling decision

Average requests in flight → (10 + 1) / 2 = 5.5

Running Instances → 1

Handle-able requests → 1* 210 = 210

5.5 

Iteration 3:

1 250
Vector is full → we can take a scaling decision

Average requests in flight → (1 + 250) / 2 = 125.5

Running Instances → 1

Handle-able requests → 1* 210 = 210

125.5 

Iteration 4:

250 190
Vector is full → we can take a scaling decision

Average requests in flight → (250 + 190) / 2 = 220

Running Instances → 1

Handle-able requests → 1* 210 = 210

220 > 210→ and pending instances=0→ scale up! → pending instances++

Iteration 5:

190 350
Vector is full → we can take a scaling decision

Average requests in flight → (190 + 350) / 2 = 270

Running Instances → 1

Handle-able requests → 1* 210 = 210
270 > 210→ and pending instances=1→ we don't scale up!

Iteration 6:

350 400
Vector is full → we can take a scaling decision

Average requests in flight → (350 + 400) / 2 = 375

Running Instances → 2

Handle-able requests → 2* 210 = 420
375 

Iteration 7:

400 160
Vector is full → we can take a scaling decision

Average requests in flight → (400 + 160) / 2 = 280

Running Instances → 2

Handle-able requests → 2* 210 = 420

280  15 * 1 → we do not scale down, since we can't handle the current load with one less running instances!

Iteration 8:

160 15
Vector is full → we can take a scaling decision

Average requests in flight → (160 + 15) / 2 = 87.5

Running Instances → 2

Handle-able requests → 2* 210 = 420

87.5  15 * 1 → we do not scale down, since we can't handle the current load with one less running instances!

Iteration 9:

15 0
Vector is full → we can take a scaling decision

Average requests in flight → (15 + 0) / 2 = 7.5

Running Instances → 2

Handle-able requests → 2* 210 = 420

7.5 

Conclusion

In conclusion, the WSO2 Elastic Load Balancer supports horizontal auto-scaling depending on the number of requests in-flight for a particular service cluster. Its auto-scaling algorithm considers a set of auto-scaling related parameters, which you can define for each and every service cluster, according to your assessments on the load for that particular service.

Author

Nirmal Fernando, Software Engineer, WSO2 Inc.