Auto scaling web services on Amazon EC2
Many web applications including eBanking, eCommers and eGovenance subjected to large fluctuation load. Some of these loads are predictable. For an example if there is a new campaign, a high load can be expected for that period. Further some load can be seasonal. There can be huge load for eCommers applications at Christmas times or at end of the month where people get their salary. In addition to this there can be sudden loads for new sites or specially for social network sites due to importance of the information. Therefore today almost all the online applications face the problem of fluctuating loads. The traditional way of dealing with this problem is to use over provisioning. This means allocate resources for peak load. But however for most of the time system receives an average load hence wasting resources under normal conditions.
The concept of auto scaling has introduced to address the peak load provisioning problem. In an auto scaled system, system should be able to increase the processing power by starting new nodes at the peak load and reduce the processing power by removing the unnecessary loads. This requirement itself introduces a need to have a computing framework which can start up
nodes and shut down nodes dynamically.
Amazon Elastic Cloud Computing (EC2) framework is such a framework where it is possible to deploy the web applications as an Amazon instance and dynamically create other instances as necessary. Then there must be a way to access different parameters of the system like latency and CPU utilization of the nodes in order to determine when to auto scale. Amazon Cloud Watch provides such a measuring framework for almost all such parameters. Amazon auto scaling is an auto scaling framework which let people to
auto scale applications deployed on EC2 using the measurements obtain from the Amazon Cloud Watch.
WSO2 Web Service Application Server (WSAS) can be used to host web services. Therefore we can use WSO2 WSAS to deploy a web service instance on EC2. Rest of this article describes how such a web service can be made to auto scale using Amazon Auto Scaling framework.
What is auto scaling?
As describes in earlier, Auto scaling means scale up or down the system according to the load received by starting up new nodes or shutting down existing nodes. But what is the user perceived difference between an auto scaling system and a non auto scaling system (which consists of an average load processing power) since user does not aware of whether it starts new nodes or
not? In order to understand this, we need to understand how a system behaves with the load. Here we assume a system which does not have the peak load provisioning and hence processing power get saturated after some load.
Initially when the load increases system through put also get increased with the load since there is no enough load to saturate the system. Once system get saturated through put remains in a constant since now all the available resources get utilized. In this stage, if the CPU usage has gone to 100% it is a CPU bound system and if the IO usage has gone to 100% it is an IO bound system. After that if the load increased further, through put decreases since system can not withstand the load.
Therefore a system should be auto scaled after it reaches its saturation state since otherwise system have unused resources. At this stage ( i. e. where throughput remains a constant) system response time (Latency) varies with the time as follows.
As it can be seen Response time increases when the System load increases. This is not a desired property since it increases the time where users has to wait for response. Therefore application response time should be kept constant. The only way to decrease the response time is to increase the processing capacity or start new nodes. Therefore we can define an auto scaled system as a system which keeps the response time in a constant while increasing the throughput for a system with fluctuating load. For an auto scaled system throughput and response time should look like this.
Creating an Amazon EC2 image with WSAS installed.
In order to send requests, we need to create a service. This service should have a sufficient load in order to have a measurable latency. A loaded service can be created by generating the code with a wsdl file for that service and adding some iterations to parse the request and serialize it. The service.zip file contains all source and generated code for the service.
Amazon EC2 provides a set of command line tools which can be used to handle any task with Amazon EC2. Through out this article these command line tools are used for various purposes and there are a lot of manuals can be found for configuring command line tools and obtaining Amazon EC2 accounts. The new Amazon EC2 image creation can be started using an existing Amazon EC2 instance. The existing ami-0d729464 image can be started with the following command comes with the command line tool set. Before proceeding with the sample commands users should have configured their private keys to access the Amazon EC2 accounts.
ec2-run-instances -k amila_test_key ami-0d729464
This command returns the instance id of the newly created instance. This can be used to find the public DNS address of the service.
Above command returns the public DNS address of the service and that can be used to log into the server.
ssh -i /home/amila/projects/autoscale/keypair/id-amila_test_key [email protected]
After this command it prompts to verify the key of the server. Note here it is not required to provide a password since we pass the key for this instance. Following commands can be used to prepare the WSO2 WSAS instance. First a jdk has to be installed.
apt-get install sun-java6-jdk
After that wget can be used to download
the wso2wsas 3.1.3.
wso2wsas-3.1.3.zip can be extracted by
using unzip command.
apt-get install unzip
WSO2 WSAS can be started as given
sh /root/wso2wsas-3.1.3/bin/daemon.sh start
This starts the WSO2 WSAS as a daemon. After sometime the following url of the admin console can be checked to verify that.
Now the the server can be stopped and the above two commands can be put to /etc/rc.local file in order to run the server at the instance boot up.
sh /root/wso2wsas-3.1.3/bin/daemon.sh stop
Then PerfService.aar file should be copied to services folder.
scp -i /home/amila/projects/autoscale/keypair/id-amila_test_key PerfService.aar [email protected]:/root/wso2wsas-3.1.3/repository/services
Now instance contains details to start up the service. Lets create an Amazon Image from that.
First the cert and pk files should be copied to newly created instance from the local machine. Here cert file names and -u value should be replaced with the correct values according to the user account details.
scp -i id-amila_test_key cert-ZOSNARTUQKXBJZAUUX2QUDN3GNEHL6K.pem pk-ZOSNARTUQKXIIIBX2QUDN3GNEHL6K.pem [email protected]:
Then the following commands can be used at the newly created instance to create the new image.
ec2-bundle-vol -d /mnt -k pk-ZOSNARTUQKXBJZAGWBX2QUDN3GNEHL6K.pem -u 6109906798 --cert cert-ZOSNARTUQKXBJZAGWBX2QUDN3GNEHL6K.pem
ec2-upload-bundle -m /mnt/image.manifest.xml -a 0Y6WUIDPHER2 -s qwhlHJHJHZC3qfCR9rth4EWOpY83L -b wso2wsas313
Finally the image can be registered with the below given command at the local machine.
This returns the newly created image id which can be used to create new instances
Configuring a non auto scaling system using Amazon EC2 load balancer
Before creating an auto scaling system it is better to create a non auto scaling system and study the response time variation with the load. Lets create a non auto scaling system using an Amazon load balancer to understand the non scaling nature of the system. This load balancer listens on the port 80 and 443 and forwards requests to 9763 and 9443 which are default WSAS http and https ports.
elb-create-lb staticlb --headers --listener "lb-port=80,instance-port=9763,protocol=http" --listener "lb-port=443,instance-port=9443,protocol=tcp" --availability-zones us-east-1c
This returns the public DNS name of the load balancer. Then a heath check can be configured to check the healthiness of the instances of the load balancer. This is an optional task but it is recommended to do so. Health check pings the given tcp port periodically and check the healthiness.
elb-configure-healthcheck staticlb --headers --target "TCP:9763" --interval 5 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 2
Now the load balancer is configured. Lets start an instance with the new image and register it with the load balancer.
ec2-run-instances ami-abdc30c2 -k amila_test_key
elb-register-instances-with-lb staticlb --instances i-84041eec
Now the load balancer has configured. That can be verified with the following command.
elb-describe-instance-health staticlb –headers
This should give a similar out put as
follows if everything works fine.
INSTANCE-ID INSTANCE-ID STATE
INSTANCE-ID i-84041eec InService
Testing the system with a load
Apache bench (ab) tool can be used as a client in order to send a load to server. A predetermined load can be send using different concurrency levels. Client.zip file contains all the details necessary for client. This client is run with different concurrency levels in order to measure the latencies. For this test a load with 10, 20, 30, 40 threads is used to measure the latency within another Amazon EC2 instance.
Amazon EC2 cloud provides a monitoring framework called Amazon Cloud watch. This can be used to find many measurements of the system. (eg Latency, CPUUtilization etc). Lets see how the Latency varied with the time using the following command.
mon-get-stats Latency --start-time 2010-02-09T12:30:00.000Z --end-time 2010-02-09T14:30:00.000Z --period 120 --statistics "Average" --namespace "AWS/ELB" –headers
Time Samples Average Unit
2010-02-09 12:54:00 1098.0 0.7402949271402551 Seconds
2010-02-09 12:56:00 1535.0 0.712273029315961 Seconds
2010-02-09 12:58:00 1145.0 0.7465975458515285 Seconds
2010-02-09 13:00:00 1534.0 0.7295192307692307 Seconds
2010-02-09 13:02:00 1489.0 0.7299966554734721 Seconds
2010-02-09 13:04:00 1530.0 0.7443238235294118 Seconds
2010-02-09 13:06:00 1504.0 0.7622699933510638 Seconds
2010-02-09 13:08:00 1349.0 1.4617230096367678 Seconds
2010-02-09 13:10:00 1152.0 1.5603545659722223 Seconds
2010-02-09 13:12:00 1530.0 1.5746869934640523 Seconds
2010-02-09 13:14:00 1529.0 1.5747163243950296 Seconds
2010-02-09 13:16:00 1528.0 1.5761940706806282 Seconds
2010-02-09 13:18:00 1527.0 1.577572514734774 Seconds
2010-02-09 13:20:00 1531.0 1.5558620640104506 Seconds
2010-02-09 13:22:00 964.0 2.3721676348547718 Seconds
2010-02-09 13:24:00 1527.0 2.386863464309103 Seconds
2010-02-09 13:26:00 1533.0 2.3824127984344423 Seconds
2010-02-09 13:28:00 1535.0 2.3786780325732898 Seconds
2010-02-09 13:30:00 1533.0 2.3797070515329422 Seconds
2010-02-09 13:32:00 1529.0 2.3878389666448663 Seconds
2010-02-09 13:34:00 1398.0 2.398446494992847 Seconds
2010-02-09 13:36:00 1095.0 3.206490292237443 Seconds
2010-02-09 13:38:00 1526.0 3.224341749672346 Seconds
2010-02-09 13:40:00 1527.0 3.212400713817944 Seconds
2010-02-09 13:42:00 1533.0 3.1931953359425963 Seconds
2010-02-09 13:44:00 1523.0 3.20844050558109 Seconds
2010-02-09 13:46:00 1528.0 3.1996112696335083 Seconds
2010-02-09 13:48:00 1148.0 3.210251245644599 Seconds
2010-02-09 13:50:00 120.0 3.1848214166666666 Seconds
By looking at the results it can be noted that latency increases with the concurrency level while executing the same amount of requests within a given time period.
Configuring an auto scaled system using an Amazon load balancer.
Firstly a load balancer should be created as in the non auto scaling case to distribute the load to nodes.
elb-create-lb autoscalelb --headers --listener "lb-port=80,instance-port=9763,protocol=http" --listener "lb-port=443,instance-port=9443,protocol=tcp" --availability-zones us-east-1c
Configuring a health check is important in this case since a WSAS instance may not have started although the Amazon image has started. Health check ensure the system always pings to the given tcp port and check for the healthiness of the node instances.
elb-configure-healthcheck autoscalelb --headers --target "TCP:9763" --interval 5 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 2
In an auto scaling system Amazon EC2 framework launches new Amazon Instances. In order to launch Amazon Instances it uses a launching configuration which contains the parameters to launch an Amazon EC2 instance. Here the parameters passed are almost equal to the parameters pass to launch an Amazon EC2 instance.
as-create-launch-config autoscalelc --image-id ami-abdc30c2 --instance-type m1.small --key amila_test_key
Auto scaling group consists of a set of nodes created by the system using the parameters specified in the given launch configuration. Users can define minimum and maximum number of nodes of the system. System adds or removes instances from the auto scaling group according to the actions specified in the triggers.
as-create-auto-scaling-group autoscleasg --availability-zones us-east-1c --launch-configuration autoscalelc --min-size 1 --max-size 10 --load-balancers autoscalelb
Finally a trigger should be configured with start actions according to the load. This is the most important step in setting up an auto scaled system. Triggers are used to either start new nodes or remove the existing nodes. Triggers are triggered when a given measurement goes below the lower bound or goes up from the upper bound. The following command can be used to create the trigger.
as-create-or-update-trigger autoscaletrigger --auto-scaling-group autoscleasg --namespace "AWS/ELB" --measure Latency --statistic Average
--dimensions "LoadBalancerName=autoscalelb" --period 60 --lower-threshold 0.5 --upper-threshold 1.2 --lower-breach-increment=-1 --upper-breach-increment 1 --breach-duration 120
This trigger creates a new node (--upper-breach-increment 1) when the Latency goes higher than 1.2 (--upper-threshold 1.2) and removes an existing node when a Latency goes bellow 0.5 (--lower-threshold 0.5). How do we know about these values? In the previous stats it can be seen that for a 10 thread load the response time is 0.7 sec and 20 thread load 1.5s. Therefore here it is configured to keep the response as with 10 threads (ie, 1.2 to 0.5).
Now lets examine the statistics collected in Amazon cloud watch. With the following command. For the same load used above.
mon-get-stats Latency --start-time 2010-02-10T05:20:00.000Z --end-time 2010-02-10T07:30:00.000Z --period 120 --statistics "Average" --namespace "AWS/ELB" --headers
Time Samples Average Unit
2010-02-10 05:20:00 604.0 0.8381257168874172 Seconds
2010-02-10 05:22:00 1481.0 0.8374174213369345 Seconds
2010-02-10 05:24:00 1472.0 0.8446638790760871 Seconds
2010-02-10 05:26:00 1471.0 0.8399494439157036 Seconds
2010-02-10 05:28:00 1462.0 0.8497985813953488 Seconds
2010-02-10 05:30:00 1465.0 0.8474125372013651 Seconds
2010-02-10 05:32:00 1096.0 0.8482769945255474 Seconds
2010-02-10 05:34:00 1297.0 1.0546850393215113 Seconds
2010-02-10 05:36:00 1424.0 1.75152170505618 Seconds
2010-02-10 05:38:00 1416.0 1.7486038086158193 Seconds
2010-02-10 05:40:00 2549.0 0.9784180812083169 Seconds
2010-02-10 05:42:00 2978.0 0.8381042531900605 Seconds
2010-02-10 05:44:00 1877.0 0.9409706334576452 Seconds
2010-02-10 05:46:00 2975.0 1.2611195714285715 Seconds
2010-02-10 05:48:00 2997.0 1.2332849245912578 Seconds
2010-02-10 05:50:00 3439.0 1.072427601337598 Seconds
2010-02-10 05:52:00 4061.0 1.1009376889928588 Seconds
2010-02-10 05:54:00 5942.0 0.8330328180747223 Seconds
As Expected these statistics shows that the system latency kept within the given range while increasing the throughput (number of messages executed). Further now the auto scaling group can be examined to see current nodes.
This gives the following out put just after test finished revealing there are four instances in the system.
AUTO-SCALING-GROUP autoscleasg autoscalelc us-east-1c autoscalelb 1 10 4
INSTANCE i-392bce52 autoscleasg us-east-1c InService
INSTANCE i-9307e2f8 autoscleasg us-east-1c InService
INSTANCE i-db00e5b0 autoscleasg us-east-1c InService
INSTANCE i-cb03e6a0 autoscleasg us-east-1c InService
Auto scaling is a concept to automatically scale up a system when the load increases. This concept is very useful for applications which are subjected to unexpected loads. This article describes auto scale as a concept which let systems to keep the response time constant while increasing the throughput. In other words it defines a normal system as a system which increases response time with the load. This article proves its behaviour by setting up a non auto scaled system and an auto scaled system with a loaded service. In the non auto scaled system it was seen that response time increases with the load and in auto scaled
system response time kept within a range while increasing the throughput.
Amila Chinthaka Suriarachchi, Technical Lead WSO2 Inc.