Overview and Sizing Guidelines

Proper capacity planning ensures your integrations have sufficient resources to handle expected workloads while maintaining performance SLAs. This page provides sizing guidelines, resource estimation methods, and scaling recommendations.

Key metrics for sizing

Metric	Description	How to Measure
Requests per second (RPS)	Expected peak throughput	Load testing with `bal test` or tools like k6, JMeter
Response latency (p95)	Target 95th percentile response time	Performance testing under load
Concurrent connections	Maximum simultaneous client connections	Connection pool configuration
Message size	Average request/response payload size	Log analysis or API analytics
Integration complexity	Number of downstream calls per request	Code analysis

Resource estimation

CPU

Workload	vCPUs per Instance	Notes
Simple passthrough (< 2 downstream calls)	0.25 - 0.5	Mostly I/O bound
Moderate transformation (2-5 downstream calls)	0.5 - 1.0	Some CPU for data mapping
Complex orchestration (5+ calls, heavy transformation)	1.0 - 2.0	CPU-intensive processing
AI/ML inference integration	2.0 - 4.0	Depends on model complexity

Memory

Workload	Memory per Instance	Notes
Lightweight service (JVM)	256 - 512 MB	Minimal heap usage
Standard service (JVM)	512 MB - 1 GB	Typical integration workload
High-throughput service (JVM)	1 - 2 GB	Large payloads, many connections
GraalVM native image	64 - 256 MB	Significantly lower footprint

Instance count

Estimate the number of instances based on throughput requirements:

instances = ceil(peak_RPS / RPS_per_instance) + buffer_instances

Example: If each instance handles 500 RPS and your peak is 2000 RPS:

instances = ceil(2000 / 500) + 1 = 5 instances

Always add at least one buffer instance for rolling updates and failover.

Sizing by deployment target

Kubernetes

resources:
  requests:
    memory: "512Mi"
    cpu: "250m"
  limits:
    memory: "1Gi"
    cpu: "1000m"

Configure Horizontal Pod Autoscaler (HPA):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: order-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Virtual machines

Deployment Size	VM Spec	Instances	Handles
Small	2 vCPU, 4 GB RAM	2	Up to 500 RPS
Medium	4 vCPU, 8 GB RAM	3-4	Up to 2000 RPS
Large	8 vCPU, 16 GB RAM	4-8	Up to 10,000 RPS

Serverless (AWS Lambda)

Setting	Recommendation
Memory	512 MB - 1 GB (JVM), 256 MB (GraalVM native)
Timeout	30 seconds (default), adjust per use case
Provisioned Concurrency	Set to expected minimum concurrent executions
Reserved Concurrency	Set to protect downstream systems

Connection pool sizing

Database connection pools

final mysql:Client dbClient = check new ({
    host: "db.example.com",
    port: 3306,
    user: "svc_user",
    password: "password",
    database: "orders",
    connectionPool: {
        maxOpenConnections: 25,
        maxConnectionLifeTime: 1800,
        minIdleConnections: 5
    }
});

Rule of thumb: maxOpenConnections = (number_of_instances * connections_per_instance) <= database_max_connections

HTTP client connection pools

final http:Client apiClient = check new ("https://api.example.com", {
    httpVersion: http:HTTP_1_1,
    poolConfig: {
        maxActiveConnections: 50,
        maxIdleConnections: 10,
        waitTime: 30
    },
    timeout: 30
});

Load testing

Validate your capacity plan with load testing before going to production:

# Using k6
k6 run --vus 100 --duration 5m load-test.js

# Using Apache Bench
ab -n 10000 -c 100 http://localhost:9090/api/orders

Key results to collect

Metric	Target
Throughput (RPS)	Meets or exceeds peak estimate
p95 Latency	Under SLA threshold
Error Rate	Under 0.1%
CPU Utilization	Under 70% at peak
Memory Utilization	Under 80% at peak

Scaling strategies

Strategy	When to Use
Vertical scaling	Single-instance workloads, quick fix
Horizontal scaling	Stateless services, high availability
Auto-scaling (HPA)	Variable traffic patterns
Event-driven scaling (KEDA)	Queue/event-driven workloads

What's next

Performance reports — Benchmark results per scenario
Scaling and high availability — Configure horizontal scaling
GraalVM native images — Reduce resource requirements with native compilation

Key metrics for sizing​

Resource estimation​

CPU​

Memory​

Instance count​

Sizing by deployment target​

Kubernetes​

Virtual machines​

Serverless (AWS Lambda)​

Connection pool sizing​

Database connection pools​

HTTP client connection pools​

Load testing​

Key results to collect​

Scaling strategies​

What's next​