27 May, 2013

“Scale up early, scale down slowly” - Auto-scaling using WSO2 Elastic Load Balancer

Nirmal Fernando
Associate Director/ Architect - WSO2

Applies to

WSO2 Elastic Load Balancer

2.0.3

Content

Introduction
What is auto-scaling?
What is the basis for auto-scaling?
How do you calculate the number of requests in-flight?
What are the decision making functions?
- Scaling up function
- Scaling down function
Example scenario to demonstrate the auto-scaling algorithm
Conclusion

Introduction

In a distributed system, the ability to expand or contract its resource pool is defined as scalability. A system can be scaled in two modes, horizontal and vertical. What we are interested in is horizontal scaling which is adding more nodes to a clustered distributed system.

In this article, you will learn the auto-scaling algorithm used in the WSO2 Elastic Load Balancer, some tips you should keep in mind when calibrating auto-scaling decision making variables, and a brief explanation of a sample scenario.

What is auto-scaling?

When requests to an application suddenly peaks, we should ideally increase the amount of resources provided for that application. The solution is auto-scaling. In an auto-scaling enabled system, the system itself detects such peaks and starts-up new server instances, to cater to the requirements without any manual interception.

With the evolution of cloud computing, today we can easily start new instances and terminate already existing instances at any given moment, that makes auto-scaling a possibility in a cloud environment.

What is the basis for auto-scaling?

Current default implementation (ServiceRequestsInFlightAutoscaler) considers a number of requests in-flight for a particular service cluster as the basis for making auto-scaling decisions. We follow the paradigm; “scale up early and scale down slowly” in the default algorithm.
What are the decision making variables?

There are few of them and all of the vital ones are configurable using loadbalancer.conf file of WSO2 Elastic Load Balancer.

autoscaler_task_interval (t) - time period between two iterations of ‘auto-scaling decision making’ task. When configuring this value, you are advised to consider the time ‘that a service instance takes to join ELB’. This is in milliseconds and the default value is 30000ms.
max_requests_per_second (Rps) - number of requests, a service instance can withstand per second. It is recommended that you calibrate this value for each service instance and also for different scenarios. The ideal way to estimate this value could be by load testing a similar service instance. Default value is 100.
rounds_to_average (r) - an auto-scaling decision will be made only after averaging the requests in-flight over this many iterations of ‘auto-scaling decision making’ task. Default value is 10.
alarming_upper_rate (AUR)- without waiting for the service instance to reach its maximum request capacity (alarming_upper_rate = 1), we scale the system up when it reaches the request capacity corresponding to alarming_upper_rate. This value should be 0=1 and default is 0.7.
alarming_lower_rate (ALR) - lower bound of the alarming rate, which gives us a hint that we can think of scaling down the system. This value should be 0=1 and default is 0.2.
scale_down_factor (SDF) - this factor is needed in order to make the scaling down process slow. We need to scale down slowly to reduce scaling down due to a false-positive event. This value should be 0=1 and default is 0.25.

How do you calculate the number of requests in-flight?

We keep track of the requests that come to the WSO2 Elastic Load Balancer for various service clusters. For each incoming request, we add a token against the relevant service cluster and when the message leaves the Load Balancer or expires, we remove the corresponding token.

What are the decision making functions?

We always respect the minimum and maximum number of instances value of service clusters. We make sure that the system always maintains the minimum number of service instance requirement and also that the system will not scale beyond its limit.

We calculate,

average requests in-flight for a particular service cluster (avg) = total number of requests in-flight * (1/r)

Scaling up....

number of maximum requests that a service instance can withstand over an autoscaler task interval (maxRpt) = (Rps) * (t/1000) * (AUR)

then, we decide to scale up, if,

avg > maxRpt * (number of running instances of this service cluster)

Scaling down....

imaginary lower bound value (minRpt) = (Rps) * (t/1000) * (ALR) * (SDF)

then, we decide to scale down, if,

avg < minRpt * (number of running instances of this service cluster - 1)

Example scenario to demonstrate the auto-scaling algorithm

Task iteration	1	2	3	4	5	6	7	8	9
Requests in-flight	10	1	250	190	350	400	160	15	0

Since, rounds_to_average value is 2, let’s use a request frame of two columns.

Iteration 1:

Vector is not full → we cannot take a scaling decision.

Iteration 2:

Vector is full → we can take a scaling decision

Average requests in flight → (10 + 1) / 2 = 5.5

Running Instances → 1

Handle-able requests → 1* 210 = 210

5.5  210→ No need to scale

Iteration 3:

250

Vector is full → we can take a scaling decision

Average requests in flight → (1 + 250) / 2 = 125.5

Running Instances → 1

Handle-able requests → 1* 210 = 210

125.5  210→ No need to scale

Iteration 4:

250

190

Vector is full → we can take a scaling decision

Average requests in flight → (250 + 190) / 2 = 220

Running Instances → 1

Handle-able requests → 1* 210 = 210

220 > 210→ and pending instances=0→ scale up! → pending instances++

Iteration 5:

190

350

Vector is full → we can take a scaling decision

Average requests in flight → (190 + 350) / 2 = 270

Running Instances → 1

Handle-able requests → 1* 210 = 210
270 > 210→ and pending instances=1→ we don't scale up!

Iteration 6:

350

400

Vector is full → we can take a scaling decision

Average requests in flight → (350 + 400) / 2 = 375

Running Instances → 2

Handle-able requests → 2* 210 = 420
375  420→ and pending instances=0→ no need to scale up.

Iteration 7:

400

160

Vector is full → we can take a scaling decision

Average requests in flight → (400 + 160) / 2 = 280

Running Instances → 2

Handle-able requests → 2* 210 = 420

280  420→ and pending instances=0→ no need to scale up.

imaginary lower bound value (minRpt)      = (Rps) * (t/1000) * (ALR) * (SDF)

                                  = 5 *  60 * 0.2 * 0.25 = 15

280 > 15 * 1 → we do not scale down, since we can't handle the current load with one less running instances!

Iteration 8:

160

Vector is full → we can take a scaling decision

Average requests in flight → (160 + 15) / 2 = 87.5

Running Instances → 2

Handle-able requests → 2* 210 = 420

87.5  420→ and pending instances=0→ no need to scale up.

imaginary lower bound value (minRpt)      = (Rps) * (t/1000) * (ALR) * (SDF)

                                  = 5 *  60 * 0.2 * 0.25 = 15

87.5 > 15 * 1 → we do not scale down, since we can't handle the current load with one less running instances!

Iteration 9:

Vector is full → we can take a scaling decision

Average requests in flight → (15 + 0) / 2 = 7.5

Running Instances → 2

Handle-able requests → 2* 210 = 420

7.5  420→ and pending instances=0→ no need to scale up.

imaginary lower bound value (minRpt)      = (Rps) * (t/1000) * (ALR) * (SDF)

                                  = 5 *  60 * 0.2 * 0.25 = 15

7.5  15 * 1 → we need to scale down, since there are instances that are not required for the system.

Conclusion

In conclusion, the WSO2 Elastic Load Balancer supports horizontal auto-scaling depending on the number of requests in-flight for a particular service cluster. Its auto-scaling algorithm considers a set of auto-scaling related parameters, which you can define for each and every service cluster, according to your assessments on the load for that particular service.

Author

Nirmal Fernando, Software Engineer, WSO2 Inc.

About Author

Nirmal Fernando
Associate Director/ Architect
WSO2

Open Source

SaaS

API Management

Open Source

SaaS

Integration

Open Source

SaaS

Identity and Access
Management

Open Source

SaaS

Internal Developer Platform

SaaS

“Scale up early, scale down slowly” - Auto-scaling using WSO2 Elastic Load Balancer

Applies to

Content

Introduction

What is auto-scaling?

What is the basis for auto-scaling?

How do you calculate the number of requests in-flight?

What are the decision making functions?

Scaling up....

Scaling down....

Example scenario to demonstrate the auto-scaling algorithm

Conclusion

Author

About Author

Products

Solutions

Resources

Support

Discover

Open Source

SaaS

API Management

Open Source

SaaS

Integration

Open Source

SaaS

Identity and Access Management

Open Source

SaaS

Internal Developer Platform

SaaS

“Scale up early, scale down slowly” - Auto-scaling using WSO2 Elastic Load Balancer

Applies to

Content

Introduction

What is auto-scaling?

What is the basis for auto-scaling?

How do you calculate the number of requests in-flight?

What are the decision making functions?

Scaling up....

Scaling down....

Example scenario to demonstrate the auto-scaling algorithm

Conclusion

Author

About Author

Identity and Access
Management