TL;DR
- Validate auto-scaling policies with real load tests before production—K6 for JavaScript/TypeScript teams, Locust for Python teams
- Use Terraform to provision ephemeral load test infrastructure: spin up, test, tear down—pay only for test duration
- Test three scenarios minimum: sustained load (baseline), spike load (auto-scaling trigger), and recovery (scale-down behavior)
Best for: Teams with auto-scaling infrastructure who need to validate scaling policies and understand capacity limits Skip if: You’re running fixed-capacity infrastructure without auto-scaling (focus on capacity planning instead) Read time: 14 minutes
Auto-scaling policies that work in theory often fail under real load. A policy that triggers at 70% CPU might scale too slowly, leaving users waiting. Or it might scale too aggressively, wasting budget. The only way to know your infrastructure handles load correctly is to test it.
For related infrastructure testing, see Terraform Testing Strategies and Network Configuration Testing.
AI-Assisted Approaches
AI tools excel at generating load test scripts and analyzing performance patterns.
Generating K6 load test scenarios:
Write a K6 load test script that validates auto-scaling behavior:
Target application: REST API with /api/users, /api/orders endpoints
Infrastructure: AWS ALB + Auto Scaling Group (min: 2, max: 10, target CPU: 70%)
Include three test stages:
1. Warm-up: Gradually ramp to 100 VUs over 2 minutes
2. Sustained load: Maintain 500 VUs for 10 minutes (should trigger scale-up)
3. Spike: Burst to 2000 VUs for 1 minute, then back to 500
4. Cool-down: Gradually decrease to 0 over 5 minutes (should trigger scale-down)
Add thresholds for:
- p95 response time < 500ms
- Error rate < 1%
- Custom metrics to track scaling events
Include CloudWatch integration to correlate load with ASG instance count.
Analyzing auto-scaling behavior:
Analyze these load test results and auto-scaling metrics:
Load test timeline:
- 0-2min: Ramp to 100 VUs, p95=120ms
- 2-12min: 500 VUs sustained, p95 started at 150ms, grew to 800ms by minute 8
- ASG scaled from 2 to 4 instances at minute 6, to 6 instances at minute 10
- 12-13min: Spike to 2000 VUs, p95=2500ms, error rate 15%
Questions:
1. Is the scaling policy too slow? What should the target tracking value be?
2. Why did latency grow before scaling happened?
3. What explains the high error rate during the spike?
4. Recommend specific changes to the auto-scaling configuration.
Creating Locust distributed load tests:
Create a Locust load test for e-commerce checkout flow:
1. Browse products (70% of traffic)
2. Add to cart (20% of traffic)
3. Checkout (10% of traffic)
Include:
- Realistic think times between actions
- Session handling for cart state
- Custom metrics for each flow stage
- Distributed setup configuration for running on Kubernetes
Show how to run this with 10 worker pods to generate 50,000 concurrent users.
When to Use Different Testing Approaches
Testing Strategy Decision Framework
| Test Type | Tool | Purpose | When to Run |
|---|---|---|---|
| Smoke test | K6/Locust | Verify system works under minimal load | Every deployment |
| Load test | K6/Locust | Validate performance at expected load | Weekly, before releases |
| Stress test | K6/Locust | Find breaking points | Monthly, after infra changes |
| Spike test | K6/Locust | Validate auto-scaling behavior | After scaling policy changes |
| Soak test | K6/Locust | Find memory leaks, connection exhaustion | Quarterly |
Auto-Scaling Validation Checklist
| Validation | What to Check | Success Criteria |
|---|---|---|
| Scale-up trigger | Time from threshold breach to new instance | < 3 minutes |
| Scale-up capacity | New instances handle traffic immediately | No request failures |
| Scale-down trigger | Instances removed when load decreases | Within cooldown period |
| Scale-down safety | No premature termination during traffic | Zero dropped requests |
| Maximum capacity | System handles max instances worth of load | Meets SLA at max scale |
K6 for Scalability Testing
Basic Auto-Scaling Validation Test
// tests/autoscaling-validation.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';
// Custom metrics
const errorRate = new Rate('errors');
const scalingLatency = new Trend('scaling_latency');
export const options = {
scenarios: {
// Stage 1: Warm-up
warmup: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '2m', target: 100 },
],
gracefulRampDown: '0s',
exec: 'defaultScenario',
},
// Stage 2: Sustained load (should trigger scale-up)
sustained: {
executor: 'constant-vus',
vus: 500,
duration: '10m',
startTime: '2m',
exec: 'defaultScenario',
},
// Stage 3: Spike (stress test auto-scaling)
spike: {
executor: 'ramping-vus',
startVUs: 500,
stages: [
{ duration: '30s', target: 2000 },
{ duration: '1m', target: 2000 },
{ duration: '30s', target: 500 },
],
startTime: '12m',
exec: 'defaultScenario',
},
// Stage 4: Cool-down (should trigger scale-down)
cooldown: {
executor: 'ramping-vus',
startVUs: 500,
stages: [
{ duration: '5m', target: 0 },
],
startTime: '14m',
gracefulRampDown: '30s',
exec: 'defaultScenario',
},
},
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests under 500ms
errors: ['rate<0.01'], // Error rate under 1%
http_req_failed: ['rate<0.01'], // Failed requests under 1%
},
};
const BASE_URL = __ENV.TARGET_URL || 'https://api.example.com';
export function defaultScenario() {
// Simulate realistic API usage
const endpoints = [
{ path: '/api/users', weight: 0.5 },
{ path: '/api/orders', weight: 0.3 },
{ path: '/api/products', weight: 0.2 },
];
const random = Math.random();
let cumulative = 0;
let selectedEndpoint = endpoints[0].path;
for (const endpoint of endpoints) {
cumulative += endpoint.weight;
if (random <= cumulative) {
selectedEndpoint = endpoint.path;
break;
}
}
const response = http.get(`${BASE_URL}${selectedEndpoint}`);
const success = check(response, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
});
errorRate.add(!success);
// Realistic think time
sleep(Math.random() * 2 + 1);
}
export function handleSummary(data) {
return {
'results/summary.json': JSON.stringify(data, null, 2),
stdout: textSummary(data, { indent: ' ', enableColors: true }),
};
}
K6 with CloudWatch Integration
// tests/k6-cloudwatch.js
import http from 'k6/http';
import { check } from 'k6';
import { AWSConfig, CloudWatchClient } from 'https://jslib.k6.io/aws/0.11.0/cloudwatch.js';
const awsConfig = new AWSConfig({
region: __ENV.AWS_REGION || 'us-east-1',
accessKeyId: __ENV.AWS_ACCESS_KEY_ID,
secretAccessKey: __ENV.AWS_SECRET_ACCESS_KEY,
});
const cloudwatch = new CloudWatchClient(awsConfig);
export const options = {
scenarios: {
load_test: {
executor: 'ramping-vus',
stages: [
{ duration: '5m', target: 500 },
{ duration: '10m', target: 500 },
{ duration: '5m', target: 0 },
],
},
},
};
export default function () {
const response = http.get(__ENV.TARGET_URL);
check(response, {
'status is 200': (r) => r.status === 200,
});
// Push custom metrics to CloudWatch
cloudwatch.putMetricData({
Namespace: 'K6/LoadTest',
MetricData: [
{
MetricName: 'ResponseTime',
Value: response.timings.duration,
Unit: 'Milliseconds',
Dimensions: [
{ Name: 'Endpoint', Value: response.url },
],
},
],
});
}
Locust for Scalability Testing
Distributed Load Test Configuration
# locustfile.py
from locust import HttpUser, task, between, events
from locust.runners import MasterRunner
import time
import logging
class WebsiteUser(HttpUser):
wait_time = between(1, 3)
def on_start(self):
"""Initialize user session."""
self.client.headers = {'Content-Type': 'application/json'}
@task(5)
def browse_products(self):
"""70% of traffic - Browse products."""
with self.client.get("/api/products", catch_response=True) as response:
if response.status_code == 200:
response.success()
else:
response.failure(f"Got status {response.status_code}")
@task(2)
def view_product_detail(self):
"""20% of traffic - View product details."""
product_id = self.get_random_product_id()
self.client.get(f"/api/products/{product_id}")
@task(1)
def checkout_flow(self):
"""10% of traffic - Full checkout flow."""
# Add to cart
self.client.post("/api/cart", json={
"product_id": self.get_random_product_id(),
"quantity": 1
})
# Checkout
with self.client.post("/api/checkout", json={
"payment_method": "card"
}, catch_response=True) as response:
if response.status_code in [200, 201]:
response.success()
elif response.status_code == 503:
response.failure("Service unavailable - scaling issue?")
def get_random_product_id(self):
import random
return random.randint(1, 1000)
# Custom metrics for scaling analysis
@events.request.add_listener
def track_response_time(request_type, name, response_time, response_length, **kwargs):
if response_time > 1000: # Log slow requests
logging.warning(f"Slow request: {name} took {response_time}ms")
# Report generation
@events.test_stop.add_listener
def on_test_stop(environment, **kwargs):
if isinstance(environment.runner, MasterRunner):
print("\n=== Auto-Scaling Analysis ===")
stats = environment.runner.stats
print(f"Total requests: {stats.total.num_requests}")
print(f"Failure rate: {stats.total.fail_ratio:.2%}")
print(f"Average response time: {stats.total.avg_response_time:.0f}ms")
print(f"95th percentile: {stats.total.get_response_time_percentile(0.95):.0f}ms")
Kubernetes Deployment for Distributed Locust
# locust-master.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: locust-master
spec:
replicas: 1
selector:
matchLabels:
app: locust
role: master
template:
metadata:
labels:
app: locust
role: master
spec:
containers:
- name: locust
image: locustio/locust:2.20.0
args:
- --master
- -f
- /mnt/locust/locustfile.py
- --host
- $(TARGET_HOST)
env:
- name: TARGET_HOST
valueFrom:
configMapKeyRef:
name: locust-config
key: target_host
ports:
- containerPort: 8089
- containerPort: 5557
volumeMounts:
- name: locust-scripts
mountPath: /mnt/locust
volumes:
- name: locust-scripts
configMap:
name: locust-scripts
---
# locust-worker.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: locust-worker
spec:
replicas: 10 # 10 workers for distributed testing
selector:
matchLabels:
app: locust
role: worker
template:
metadata:
labels:
app: locust
role: worker
spec:
containers:
- name: locust
image: locustio/locust:2.20.0
args:
- --worker
- --master-host=locust-master
- -f
- /mnt/locust/locustfile.py
volumeMounts:
- name: locust-scripts
mountPath: /mnt/locust
volumes:
- name: locust-scripts
configMap:
name: locust-scripts
Terraform for Load Test Infrastructure
Ephemeral Load Test Environment
# modules/load-test-infra/main.tf
variable "run_load_test" {
description = "Set to true to provision load test infrastructure"
type = bool
default = false
}
variable "worker_count" {
description = "Number of K6/Locust workers"
type = number
default = 5
}
# ECS cluster for load generators
resource "aws_ecs_cluster" "load_test" {
count = var.run_load_test ? 1 : 0
name = "load-test-cluster"
setting {
name = "containerInsights"
value = "enabled"
}
tags = {
Purpose = "LoadTesting"
AutoClean = "true"
}
}
# K6 task definition
resource "aws_ecs_task_definition" "k6" {
count = var.run_load_test ? 1 : 0
family = "k6-load-test"
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
cpu = 2048
memory = 4096
execution_role_arn = aws_iam_role.ecs_execution[0].arn
container_definitions = jsonencode([
{
name = "k6"
image = "grafana/k6:latest"
command = [
"run",
"--out", "cloud",
"/scripts/load-test.js"
]
environment = [
{
name = "TARGET_URL"
value = var.target_url
},
{
name = "K6_CLOUD_TOKEN"
value = var.k6_cloud_token
}
]
mountPoints = [
{
sourceVolume = "scripts"
containerPath = "/scripts"
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = "/ecs/k6-load-test"
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "k6"
}
}
}
])
volume {
name = "scripts"
efs_volume_configuration {
file_system_id = aws_efs_file_system.scripts[0].id
}
}
}
# Run load test as ECS service
resource "aws_ecs_service" "k6_workers" {
count = var.run_load_test ? 1 : 0
name = "k6-workers"
cluster = aws_ecs_cluster.load_test[0].id
task_definition = aws_ecs_task_definition.k6[0].arn
desired_count = var.worker_count
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.load_test[0].id]
}
}
output "load_test_status" {
value = var.run_load_test ? "Load test infrastructure deployed with ${var.worker_count} workers" : "Load test infrastructure not deployed"
}
CI/CD Integration
GitHub Actions Load Test Workflow
name: Scalability Testing
on:
schedule:
- cron: '0 4 * * 1' # Weekly Monday 4 AM
workflow_dispatch:
inputs:
test_duration:
description: 'Test duration (e.g., 10m, 1h)'
default: '20m'
max_vus:
description: 'Maximum virtual users'
default: '1000'
jobs:
load-test:
runs-on: ubuntu-latest
environment: load-test
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.LOAD_TEST_ROLE_ARN }}
aws-region: us-east-1
- name: Setup K6
run: |
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
--keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" \
| sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update && sudo apt-get install k6
- name: Record baseline metrics
id: baseline
run: |
# Get current ASG instance count
INSTANCE_COUNT=$(aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names ${{ vars.ASG_NAME }} \
--query 'AutoScalingGroups[0].Instances | length(@)')
echo "baseline_instances=$INSTANCE_COUNT" >> $GITHUB_OUTPUT
- name: Run load test
run: |
k6 run tests/autoscaling-validation.js \
--env TARGET_URL=${{ vars.TARGET_URL }} \
--duration ${{ inputs.test_duration || '20m' }} \
--vus ${{ inputs.max_vus || '1000' }} \
--out json=results/k6-results.json
- name: Analyze scaling behavior
run: |
# Get scaling events during test
aws autoscaling describe-scaling-activities \
--auto-scaling-group-name ${{ vars.ASG_NAME }} \
--max-items 20 \
--query 'Activities[?StartTime>=`'$(date -d '30 minutes ago' -Iseconds)'`]' \
> results/scaling-events.json
python3 scripts/analyze-scaling.py \
--k6-results results/k6-results.json \
--scaling-events results/scaling-events.json \
--output results/analysis.md
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: load-test-results
path: results/
- name: Check thresholds
run: |
# Fail if scaling was too slow or error rate too high
python3 scripts/check-thresholds.py \
--results results/k6-results.json \
--max-p95-latency 500 \
--max-error-rate 0.01 \
--max-scale-time 180
Measuring Success
| Metric | Target | How to Track |
|---|---|---|
| Scale-up latency | < 3 minutes from trigger | CloudWatch ASG metrics |
| P95 latency during scale | < 500ms | K6/Locust results |
| Error rate during spike | < 1% | K6/Locust results |
| Scale-down accuracy | Within 2x cooldown period | ASG activity logs |
| Cost efficiency | No over-provisioning | AWS Cost Explorer |
Warning signs your scalability testing isn’t working:
- Tests pass but production still has scaling issues
- Scaling policies never trigger during tests (load too low)
- Test environment doesn’t match production (different instance types, limits)
- Results vary wildly between test runs (inconsistent baseline)
Conclusion
Effective infrastructure scalability testing requires realistic scenarios and proper tooling:
- Test three scenarios minimum: sustained load, spike, and recovery
- Use ephemeral infrastructure with Terraform for cost-effective testing
- Integrate with CI/CD for regular validation
- Correlate metrics between load test results and infrastructure scaling
- Document thresholds and alert when tests fail
The key insight: auto-scaling policies need validation under real load conditions. Theoretical calculations aren’t enough—test your infrastructure before your users do.
Official Resources
See Also
- Terraform Testing Strategies - Infrastructure testing fundamentals
- Network Configuration Testing - Validate network can handle scale
- Backup and Disaster Recovery Testing - Ensure DR works at scale
- AWS Infrastructure Testing - Broader AWS testing strategies
- Kubernetes Testing Strategies - Container orchestration testing
