Auto scaling web services on Amazon EC2

Archived Content
This article is provided for historical perspective only, and may not reflect current conditions. Please refer to relevant product page for more up-to-date product information and resources.
  • By Amila Suriarachchi
  • 12 Feb, 2010

Auto scaling web services on Amazon EC2


Many web applications including eBanking, eCommers and eGovenance subjected to large fluctuation load. Some of these loads are predictable. For an example if there is a new campaign, a high load can be expected for that period. Further some load can be seasonal. There can be huge load for eCommers applications at Christmas times or at end of the month where people get their salary. In addition to this there can be sudden loads for new sites or specially for social network sites due to importance of the information. Therefore today almost all the online applications face the problem of fluctuating loads. The traditional way of dealing with this problem is to use over provisioning. This means allocate resources for peak load. But however for most of the time system receives an average load hence wasting resources under normal conditions.


The concept of auto scaling has introduced to address the peak load provisioning problem. In an auto scaled system, system should be able to increase the processing power by starting new nodes at the peak load and reduce the processing power by removing the unnecessary loads. This requirement itself introduces a need to have a computing framework which can start up
nodes and shut down nodes dynamically.

Amazon Elastic Cloud Computing (EC2) framework is such a framework where it is possible to deploy the web applications as an Amazon instance and dynamically create other instances as necessary. Then there must be a way to access different parameters of the system like latency and CPU utilization of the nodes in order to determine when to auto scale. Amazon Cloud Watch provides such a measuring framework for almost all such parameters. Amazon auto scaling is an auto scaling framework which let people to
auto scale applications deployed on EC2 using the measurements obtain from the Amazon Cloud Watch.

WSO2 Web Service Application Server (WSAS) can be used to host web services. Therefore we can use WSO2 WSAS to deploy a web service instance on EC2. Rest of this article describes how such a web service can be made to auto scale using Amazon Auto Scaling framework.

What is auto scaling?

As describes in earlier, Auto scaling means scale up or down the system according to the load received by starting up new nodes or shutting down existing nodes. But what is the user perceived difference between an auto scaling system and a non auto scaling system (which consists of an average load processing power) since user does not aware of whether it starts new nodes or
not? In order to understand this, we need to understand how a system behaves with the load. Here we assume a system which does not have the peak load provisioning and hence processing power get saturated after some load.

Initially when the load increases system through put also get increased with the load since there is no enough load to saturate the system. Once system get saturated through put remains in a constant since now all the available resources get utilized. In this stage, if the CPU usage has gone to 100% it is a CPU bound system and if the IO usage has gone to 100% it is an IO bound system. After that if the load increased further, through put decreases since system can not withstand the load.

Therefore a system should be auto scaled after it reaches its saturation state since otherwise system have unused resources. At this stage ( i. e. where throughput remains a constant) system response time (Latency) varies with the time as follows.

As it can be seen Response time increases when the System load increases. This is not a desired property since it increases the time where users has to wait for response. Therefore application response time should be kept constant. The only way to decrease the response time is to increase the processing capacity or start new nodes. Therefore we can define an auto scaled system as a system which keeps the response time in a constant while increasing the throughput for a system with fluctuating load. For an auto scaled system throughput and response time should look like this.

Creating an Amazon EC2 image with WSAS installed.

In order to send requests, we need to create a service. This service should have a sufficient load in order to have a measurable latency. A loaded service can be created by generating the code with a wsdl file for that service and adding some iterations to parse the request and serialize it. The file contains all source and generated code for the service.

Amazon EC2 provides a set of command line tools which can be used to handle any task with Amazon EC2. Through out this article these command line tools are used for various purposes and there are a lot of manuals can be found for configuring command line tools and obtaining Amazon EC2 accounts. The new Amazon EC2 image creation can be started using an existing Amazon EC2 instance. The existing ami-0d729464 image can be started with the following command comes with the command line tool set. Before proceeding with the sample commands users should have configured their private keys to access the Amazon EC2 accounts.

ec2-run-instances -k amila_test_key ami-0d729464

This command returns the instance id of the newly created instance. This can be used to find the public DNS address of the service.

ec2-describe-instances i-08514b60

Above command returns the public DNS address of the service and that can be used to log into the server.

ssh -i /home/amila/projects/autoscale/keypair/id-amila_test_key [email protected] 

After this command it prompts to verify the key of the server. Note here it is not required to provide a password since we pass the key for this instance. Following commands can be used to prepare the WSO2 WSAS instance. First a jdk has to be installed.

apt-get update
apt-get install sun-java6-jdk

After that wget can be used to download
the wso2wsas 3.1.3.

wget can be extracted by
using unzip command.

apt-get install unzip

WSO2 WSAS can be started as given

export JAVA_HOME=/usr/lib/jvm/java-6-sun-
sh /root/wso2wsas-3.1.3/bin/ start

This starts the WSO2 WSAS as a daemon. After sometime the following url of the admin console can be checked to verify that.

Now the the server can be stopped and the above two commands can be put to /etc/rc.local file in order to run the server at the instance boot up.

sh /root/wso2wsas-3.1.3/bin/ stop

Then PerfService.aar file should be copied to services folder.

scp -i /home/amila/projects/autoscale/keypair/id-amila_test_key PerfService.aar [email protected]:/root/wso2wsas-3.1.3/repository/services

Now instance contains details to start up the service. Lets create an Amazon Image from that.

First the cert and pk files should be copied to newly created instance from the local machine. Here cert file names and -u value should be replaced with the correct values according to the user account details.

scp -i id-amila_test_key cert-ZOSNARTUQKXBJZAUUX2QUDN3GNEHL6K.pem pk-ZOSNARTUQKXIIIBX2QUDN3GNEHL6K.pem [email protected]:

Then the following commands can be used at the newly created instance to create the new image.

ec2-bundle-vol -d /mnt -k pk-ZOSNARTUQKXBJZAGWBX2QUDN3GNEHL6K.pem -u 6109906798 --cert cert-ZOSNARTUQKXBJZAGWBX2QUDN3GNEHL6K.pem
ec2-upload-bundle -m /mnt/image.manifest.xml -a 0Y6WUIDPHER2 -s qwhlHJHJHZC3qfCR9rth4EWOpY83L -b wso2wsas313

Finally the image can be registered with the below given command at the local machine.

ec2-register wso2wsas313/image.manifest.xml

This returns the newly created image id which can be used to create new instances

Configuring a non auto scaling system using Amazon EC2 load balancer

Before creating an auto scaling system it is better to create a non auto scaling system and study the response time variation with the load. Lets create a non auto scaling system using an Amazon load balancer to understand the non scaling nature of the system. This load balancer listens on the port 80 and 443 and forwards requests to 9763 and 9443 which are default WSAS http and https ports.

elb-create-lb  staticlb --headers --listener "lb-port=80,instance-port=9763,protocol=http" --listener "lb-port=443,instance-port=9443,protocol=tcp" --availability-zones us-east-1c

This returns the public DNS name of the load balancer. Then a heath check can be configured to check the healthiness of the instances of the load balancer. This is an optional task but it is recommended to do so. Health check pings the given tcp port periodically and check the healthiness.

elb-configure-healthcheck  staticlb --headers --target "TCP:9763" --interval 5 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 2

Now the load balancer is configured. Lets start an instance with the new image and register it with the load balancer.

ec2-run-instances ami-abdc30c2 -k amila_test_key
elb-register-instances-with-lb staticlb --instances i-84041eec

Now the load balancer has configured. That can be verified with the following command.

elb-describe-instance-health staticlb –headers

This should give a similar out put as
follows if everything works fine.

INSTANCE-ID  i-84041eec   InService

Testing the system with a load

Apache bench (ab) tool can be used as a client in order to send a load to server. A predetermined load can be send using different concurrency levels. file contains all the details necessary for client. This client is run with different concurrency levels in order to measure the latencies. For this test a load with 10, 20, 30, 40 threads is used to measure the latency within another Amazon EC2 instance.

Amazon EC2 cloud provides a monitoring framework called Amazon Cloud watch. This can be used to find many measurements of the system. (eg Latency, CPUUtilization etc). Lets see how the Latency varied with the time using the following command.

mon-get-stats Latency --start-time 2010-02-09T12:30:00.000Z --end-time 2010-02-09T14:30:00.000Z --period 120 --statistics "Average" --namespace "AWS/ELB" –headers
Time                 Samples  Average              Unit
2010-02-09 12:54:00  1098.0   0.7402949271402551   Seconds
2010-02-09 12:56:00  1535.0   0.712273029315961    Seconds
2010-02-09 12:58:00  1145.0   0.7465975458515285   Seconds
2010-02-09 13:00:00  1534.0   0.7295192307692307   Seconds
2010-02-09 13:02:00  1489.0   0.7299966554734721   Seconds
2010-02-09 13:04:00  1530.0   0.7443238235294118   Seconds
2010-02-09 13:06:00  1504.0   0.7622699933510638   Seconds
2010-02-09 13:08:00  1349.0   1.4617230096367678   Seconds
2010-02-09 13:10:00  1152.0   1.5603545659722223   Seconds
2010-02-09 13:12:00  1530.0   1.5746869934640523   Seconds
2010-02-09 13:14:00  1529.0   1.5747163243950296   Seconds
2010-02-09 13:16:00  1528.0   1.5761940706806282   Seconds
2010-02-09 13:18:00  1527.0   1.577572514734774    Seconds
2010-02-09 13:20:00  1531.0   1.5558620640104506   Seconds
2010-02-09 13:22:00  964.0    2.3721676348547718   Seconds
2010-02-09 13:24:00  1527.0   2.386863464309103    Seconds
2010-02-09 13:26:00  1533.0   2.3824127984344423   Seconds
2010-02-09 13:28:00  1535.0   2.3786780325732898   Seconds
2010-02-09 13:30:00  1533.0   2.3797070515329422   Seconds
2010-02-09 13:32:00  1529.0   2.3878389666448663   Seconds
2010-02-09 13:34:00  1398.0   2.398446494992847    Seconds
2010-02-09 13:36:00  1095.0   3.206490292237443    Seconds
2010-02-09 13:38:00  1526.0   3.224341749672346    Seconds
2010-02-09 13:40:00  1527.0   3.212400713817944    Seconds
2010-02-09 13:42:00  1533.0   3.1931953359425963   Seconds 
2010-02-09 13:44:00  1523.0   3.20844050558109     Seconds 
2010-02-09 13:46:00  1528.0   3.1996112696335083   Seconds
2010-02-09 13:48:00  1148.0   3.210251245644599    Seconds
2010-02-09 13:50:00  120.0    3.1848214166666666   Seconds

By looking at the results it can be noted that latency increases with the concurrency level while executing the same amount of requests within a given time period.

Configuring an auto scaled system using an Amazon load balancer.

Firstly a load balancer should be created as in the non auto scaling case to distribute the load to nodes.

elb-create-lb  autoscalelb --headers --listener "lb-port=80,instance-port=9763,protocol=http" --listener "lb-port=443,instance-port=9443,protocol=tcp" --availability-zones us-east-1c

Configuring a health check is important in this case since a WSAS instance may not have started although the Amazon image has started. Health check ensure the system always pings to the given tcp port and check for the healthiness of the node instances.

elb-configure-healthcheck  autoscalelb --headers --target "TCP:9763" --interval 5 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 2

In an auto scaling system Amazon EC2 framework launches new Amazon Instances. In order to launch Amazon Instances it uses a launching configuration which contains the parameters to launch an Amazon EC2 instance. Here the parameters passed are almost equal to the parameters pass to launch an Amazon EC2 instance.

as-create-launch-config autoscalelc --image-id ami-abdc30c2 --instance-type m1.small --key amila_test_key

Auto scaling group consists of a set of nodes created by the system using the parameters specified in the given launch configuration. Users can define minimum and maximum number of nodes of the system. System adds or removes instances from the auto scaling group according to the actions specified in the triggers.

as-create-auto-scaling-group autoscleasg --availability-zones us-east-1c --launch-configuration autoscalelc --min-size 1 --max-size 10 --load-balancers autoscalelb

Finally a trigger should be configured with start actions according to the load. This is the most important step in setting up an auto scaled system. Triggers are used to either start new nodes or remove the existing nodes. Triggers are triggered when a given measurement goes below the lower bound or goes up from the upper bound. The following command can be used to create the trigger.

as-create-or-update-trigger autoscaletrigger --auto-scaling-group autoscleasg --namespace "AWS/ELB" --measure Latency --statistic Average
--dimensions "LoadBalancerName=autoscalelb" --period 60 --lower-threshold 0.5 --upper-threshold 1.2 --lower-breach-increment=-1 --upper-breach-increment 1 --breach-duration 120

This trigger creates a new node (--upper-breach-increment 1) when the Latency goes higher than 1.2 (--upper-threshold 1.2) and removes an existing node when a Latency goes bellow 0.5 (--lower-threshold 0.5). How do we know about these values? In the previous stats it can be seen that for a 10 thread load the response time is 0.7 sec and 20 thread load 1.5s. Therefore here it is configured to keep the response as with 10 threads (ie, 1.2 to 0.5).

Now lets examine the statistics collected in Amazon cloud watch. With the following command. For the same load used above.

mon-get-stats Latency --start-time 2010-02-10T05:20:00.000Z --end-time 2010-02-10T07:30:00.000Z --period 120 --statistics "Average" --namespace "AWS/ELB" --headers
Time                 Samples  Average             Unit
2010-02-10 05:20:00  604.0    0.8381257168874172  Seconds
2010-02-10 05:22:00  1481.0   0.8374174213369345  Seconds
2010-02-10 05:24:00  1472.0   0.8446638790760871  Seconds
2010-02-10 05:26:00  1471.0   0.8399494439157036  Seconds
2010-02-10 05:28:00  1462.0   0.8497985813953488  Seconds
2010-02-10 05:30:00  1465.0   0.8474125372013651  Seconds
2010-02-10 05:32:00  1096.0   0.8482769945255474  Seconds
2010-02-10 05:34:00  1297.0   1.0546850393215113  Seconds
2010-02-10 05:36:00  1424.0   1.75152170505618    Seconds
2010-02-10 05:38:00  1416.0   1.7486038086158193  Seconds
2010-02-10 05:40:00  2549.0   0.9784180812083169  Seconds
2010-02-10 05:42:00  2978.0   0.8381042531900605  Seconds
2010-02-10 05:44:00  1877.0   0.9409706334576452  Seconds
2010-02-10 05:46:00  2975.0   1.2611195714285715  Seconds
2010-02-10 05:48:00  2997.0   1.2332849245912578  Seconds
2010-02-10 05:50:00  3439.0   1.072427601337598   Seconds
2010-02-10 05:52:00  4061.0   1.1009376889928588  Seconds
2010-02-10 05:54:00  5942.0   0.8330328180747223  Seconds

As Expected these statistics shows that the system latency kept within the given range while increasing the throughput (number of messages executed). Further now the auto scaling group can be examined to see current nodes.

as-describe-auto-scaling-groups autoscleasg

This gives the following out put just after test finished revealing there are four instances in the system.

AUTO-SCALING-GROUP  autoscleasg autoscalelc  us-east-1c  autoscalelb  1  10  4 
INSTANCE  i-392bce52  autoscleasg us-east-1c  InService
INSTANCE  i-9307e2f8  autoscleasg us-east-1c  InService
INSTANCE  i-db00e5b0  autoscleasg us-east-1c  InService
INSTANCE  i-cb03e6a0  autoscleasg us-east-1c  InService


Auto scaling is a concept to automatically scale up a system when the load increases. This concept is very useful for applications which are subjected to unexpected loads. This article describes auto scale as a concept which let systems to keep the response time constant while increasing the throughput. In other words it defines a normal system as a system which increases response time with the load. This article proves its behaviour by setting up a non auto scaled system and an auto scaled system with a loaded service. In the non auto scaled system it was seen that response time increases with the load and in auto scaled
system response time kept within a range while increasing the throughput.


Amila Chinthaka Suriarachchi, Technical Lead WSO2 Inc.

About Author

  • Amila Suriarachchi
  • Architect, Member, Management Committee - Data Technologies
  • WSO2 Inc.