Scaling & High Availability
Design and configure your WSO2 Integrator deployments for horizontal scaling, high availability, and resilience in production environments.
Stateless design principles
Ballerina services in WSO2 Integrator are designed to be stateless, making them straightforward to scale horizontally. Follow these principles to ensure your integrations scale correctly:
- Avoid in-memory state -- Do not store session or request state in module-level variables. Use external stores (databases, Redis, or distributed caches) for shared state.
- Externalize configuration -- Use
Config.tomland environment variables rather than hardcoded values. - Idempotent operations -- Design resource functions to produce the same result when called multiple times, allowing safe retries behind a load balancer.
import ballerina/http;
import ballerina/cache;
// Use an external cache or database for shared state
configurable string redisHost = "localhost";
configurable int redisPort = 6379;
service /orders on new http:Listener(9090) {
// Each request is self-contained -- no in-memory session state
resource function get [string orderId]() returns json|error {
// Fetch from external data store
json order = check getOrderFromDatabase(orderId);
return order;
}
}
Horizontal scaling configuration
Kubernetes replica scaling
Set the number of replicas in your Kubernetes deployment:
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: wso2-integrator-app
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: wso2-integrator-app
template:
metadata:
labels:
app: wso2-integrator-app
spec:
containers:
- name: app
image: registry.example.com/wso2-integrator-app:latest
ports:
- containerPort: 9090
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
Horizontal pod autoscaler (HPA)
Automatically scale based on CPU or memory utilization:
# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: wso2-integrator-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: wso2-integrator-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 2
periodSeconds: 60
Cloud.toml Auto-Scaling
Configure auto-scaling directly in your project's Cloud.toml:
# Cloud.toml
[cloud.deployment.autoscaling]
min_replicas = 2
max_replicas = 10
cpu_threshold = 70
memory_threshold = 80
Load balancing considerations
Kubernetes service
Expose your deployment through a Kubernetes Service for internal load balancing:
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: wso2-integrator-service
namespace: production
spec:
type: ClusterIP
selector:
app: wso2-integrator-app
ports:
- port: 80
targetPort: 9090
protocol: TCP
Ingress configuration
For external traffic, configure an Ingress resource:
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: wso2-integrator-ingress
namespace: production
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
spec:
tls:
- hosts:
- api.example.com
secretName: tls-secret
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: wso2-integrator-service
port:
number: 80
Session affinity
If your integration requires sticky sessions (not recommended for stateless services), configure session affinity on the Service:
spec:
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 600
Health checks
Configure health probes so Kubernetes can detect unhealthy instances and route traffic away from them.
Ballerina health endpoint
Ballerina HTTP services support health check endpoints. Add a health resource to your service:
import ballerina/http;
service /orders on new http:Listener(9090) {
// Health check endpoint for Kubernetes probes
resource function get healthz() returns http:Ok {
return http:OK;
}
// Readiness check -- verify external dependencies
resource function get readyz() returns http:Ok|http:ServiceUnavailable {
boolean dbHealthy = checkDatabaseConnection();
if dbHealthy {
return http:OK;
}
return http:SERVICE_UNAVAILABLE;
}
}
Kubernetes probe configuration
spec:
containers:
- name: app
livenessProbe:
httpGet:
path: /orders/healthz
port: 9090
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /orders/readyz
port: 9090
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
startupProbe:
httpGet:
path: /orders/healthz
port: 9090
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 30
Failover and resilience
Multi-Zone deployment
Distribute pods across availability zones using topology spread constraints:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: wso2-integrator-app
Pod disruption budget
Ensure a minimum number of pods remain available during voluntary disruptions (node upgrades, scaling events):
# k8s/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: wso2-integrator-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: wso2-integrator-app
Graceful shutdown
Ballerina services handle graceful shutdown automatically. When a SIGTERM signal is received, the runtime stops accepting new requests and waits for in-flight requests to complete before shutting down.
Configure the Kubernetes termination grace period to allow enough time for in-flight requests:
spec:
terminationGracePeriodSeconds: 60
containers:
- name: app
lifecycle:
preStop:
exec:
command: ["sleep", "5"] # Allow load balancer to deregister
The preStop sleep ensures the pod is removed from the Service endpoints before the application begins shutting down, preventing requests from being routed to a terminating pod.
What's next
- Environments -- Manage configuration across environments
- Deploy to AWS / Azure / GCP -- Cloud-specific scaling options
- Metrics -- Monitor scaling behavior with Prometheus