Mastering the Art and Science of Capacity Planning
- Samudra Weerasinghe
- Senior Lead Marketing Officer - WSO2
Imagine an e-commerce provider running out of memory on Black Friday or a search engine provider paying 20 times the cost of their optimal server capacity. Both these scenarios are great examples of what makes a solutions architect cringe. That’s why the ability to forecast the capacity of a system is an important activity in enterprise system design and solution architecture.
Capacity planning is an art as much as it is a science. Along with certain parameters, it also involves experience, knowledge of the domain itself and insight into the system. In some instances, it goes as far as analyzing the psychology of the system’s expected users and their usage patterns.
Mifan Careem from the WSO2 Solutions Architecture team recently wrote a white paper that looks at factors affecting the capacity of a system and how you can calculate your system’s capacity using these factors.
Here are some insights from this white paper.
There are multiple methodologies for carrying out capacity planning. A few parameters that will help include:
- Transactions per second: number of actions per unit time
- Work done per transaction: level of operations a transaction triggers
- Think time: delay between user requests
- Active users: users who use the system at a given time
- Concurrent users: a subset of active users that perform actions at the same time
- Message size: size of the message passed across the ‘wire’
- Latency: additional time spent due to the introduction of a system
- Other non-functional QoS requirements such as guaranteed message delivery, transmission of secure messages, throttling and uptime
- Profiling and load testing your application
- Caching to improve your performance and latency
- Having buffer capacity when allocating server specifications
- Server profiling via monitoring and profiling tools
- Scalability: the ability to handle requests in proportion to available hardware resources
- High availability: a system that is continuously operational for a long period of time
- Disaster recovery: the replication of the primary site onto a geographically separate site
- Backup and recovery: the replication of system state and system data onto a backup medium
- Cloud: allows servers to be deployed in different geographically separated locations providing accessible means of achieving full-scale, high availability
Undefined