11 Mar, 2024 | 3 min read

Scaling Smart: Introducing Choreo’s Scale-to-Zero for Optimal Resource Utilization

  • Lakmal Warusawithana
  • Senior Director - Cloud Architecture - WSO2

While the cloud facilitates quicker and easier completion of tasks, it’s important to use resources efficiently. Careless usage can lead to high costs over time. Choreo’s new scale-to-zero feature allows you to minimize costs by scaling down application resource usage to almost zero when not in active use. This capability is a significant step forward towards apps being more resource-conscious and cost-effective.

Choreo’s serverless service model

Scale-to-zero is primarily aimed at service types within Choreo. Traditionally, services are designed to run continuously, ready to handle requests at any moment. However, in reality, many services don’t receive continuous requests, yet they remain running as there’s no mechanism to scale down when idle and scale up on demand. This is the gap that Choreo's latest feature aims to bridge, offering a significant improvement in efficiency. 

Now services in Choreo can scale down when they are not in use so that idle services don’t unnecessarily consume resources. But what happens when a new request comes in? Here's the clever part: the first incoming request is temporarily held back while Choreo instructs its internal APIs to scale up the service workload. Once the service is adequately scaled and ready, the request is then processed.

Scale-to-zero will be enabled by default 

With the scale-to-zero feature, Choreo is transforming how HTTP-based services operate within its ecosystem. This change affects all types of services, including public APIs, internal APIs used within organizations, and all webapps. By default, these services will now adopt scale-to-zero configurations.

But what does this mean in practice? The minimum replica count for services is now set to zero, enabling full scalability. Users have the flexibility to set a maximum replica count based on their anticipated service load. When the need arises, the service scales up by adding more replicas to the cluster in response to incoming requests.

The scaling of services is based on the number of requests waiting in the load balancer, which manages the traffic to the services. This method is particularly suitable for HTTP-based services because it focuses on the real-time demand instead of just monitoring the CPU and memory usage within containers. Users can adjust the queue size in the load balancer, giving them control over how quickly and efficiently their services scale up to meet demand.

Is HPA being phased out?

The Horizontal Pod Autoscaler (HPA) will remain unchanged. If users prefer to scale their application services based on CPU or memory consumption, they can still use HPA. However, unlike scale-to-zero, HPA doesn't allow scaling down to zero replicas since CPU and memory usage never reaches zero.


Read our documentation to discover more about how to use scale-to-zero in your application. If you haven't already, sign up and begin your journey with Choreo today for free.