TL;DR

  • Validate auto-scaling policies with real load tests before production—K6 for JavaScript/TypeScript teams, Locust for Python teams
  • Use Terraform to provision ephemeral load test infrastructure: spin up, test, tear down—pay only for test duration
  • Test three scenarios minimum: sustained load (baseline), spike load (auto-scaling trigger), and recovery (scale-down behavior)

Best for: Teams with auto-scaling infrastructure who need to validate scaling policies and understand capacity limits Skip if: You’re running fixed-capacity infrastructure without auto-scaling (focus on capacity planning instead) Read time: 14 minutes

Auto-scaling policies that work in theory often fail under real load. A policy that triggers at 70% CPU might scale too slowly, leaving users waiting. Or it might scale too aggressively, wasting budget. The only way to know your infrastructure handles load correctly is to test it.

For related infrastructure testing, see Terraform Testing Strategies and Network Configuration Testing.

AI-Assisted Approaches

AI tools excel at generating load test scripts and analyzing performance patterns.

Generating K6 load test scenarios:

Write a K6 load test script that validates auto-scaling behavior:

Target application: REST API with /api/users, /api/orders endpoints
Infrastructure: AWS ALB + Auto Scaling Group (min: 2, max: 10, target CPU: 70%)

Include three test stages:

1. Warm-up: Gradually ramp to 100 VUs over 2 minutes
2. Sustained load: Maintain 500 VUs for 10 minutes (should trigger scale-up)
3. Spike: Burst to 2000 VUs for 1 minute, then back to 500
4. Cool-down: Gradually decrease to 0 over 5 minutes (should trigger scale-down)

Add thresholds for:

- p95 response time < 500ms
- Error rate < 1%
- Custom metrics to track scaling events

Include CloudWatch integration to correlate load with ASG instance count.

Analyzing auto-scaling behavior:

Analyze these load test results and auto-scaling metrics:

Load test timeline:

- 0-2min: Ramp to 100 VUs, p95=120ms
- 2-12min: 500 VUs sustained, p95 started at 150ms, grew to 800ms by minute 8
- ASG scaled from 2 to 4 instances at minute 6, to 6 instances at minute 10
- 12-13min: Spike to 2000 VUs, p95=2500ms, error rate 15%

Questions:

1. Is the scaling policy too slow? What should the target tracking value be?
2. Why did latency grow before scaling happened?
3. What explains the high error rate during the spike?
4. Recommend specific changes to the auto-scaling configuration.

Creating Locust distributed load tests:

Create a Locust load test for e-commerce checkout flow:

1. Browse products (70% of traffic)
2. Add to cart (20% of traffic)
3. Checkout (10% of traffic)

Include:

- Realistic think times between actions
- Session handling for cart state
- Custom metrics for each flow stage
- Distributed setup configuration for running on Kubernetes

Show how to run this with 10 worker pods to generate 50,000 concurrent users.

When to Use Different Testing Approaches

Testing Strategy Decision Framework

Test TypeToolPurposeWhen to Run
Smoke testK6/LocustVerify system works under minimal loadEvery deployment
Load testK6/LocustValidate performance at expected loadWeekly, before releases
Stress testK6/LocustFind breaking pointsMonthly, after infra changes
Spike testK6/LocustValidate auto-scaling behaviorAfter scaling policy changes
Soak testK6/LocustFind memory leaks, connection exhaustionQuarterly

Auto-Scaling Validation Checklist

ValidationWhat to CheckSuccess Criteria
Scale-up triggerTime from threshold breach to new instance< 3 minutes
Scale-up capacityNew instances handle traffic immediatelyNo request failures
Scale-down triggerInstances removed when load decreasesWithin cooldown period
Scale-down safetyNo premature termination during trafficZero dropped requests
Maximum capacitySystem handles max instances worth of loadMeets SLA at max scale

K6 for Scalability Testing

Basic Auto-Scaling Validation Test

// tests/autoscaling-validation.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

// Custom metrics
const errorRate = new Rate('errors');
const scalingLatency = new Trend('scaling_latency');

export const options = {
  scenarios: {
    // Stage 1: Warm-up
    warmup: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 100 },
      ],
      gracefulRampDown: '0s',
      exec: 'defaultScenario',
    },
    // Stage 2: Sustained load (should trigger scale-up)
    sustained: {
      executor: 'constant-vus',
      vus: 500,
      duration: '10m',
      startTime: '2m',
      exec: 'defaultScenario',
    },
    // Stage 3: Spike (stress test auto-scaling)
    spike: {
      executor: 'ramping-vus',
      startVUs: 500,
      stages: [
        { duration: '30s', target: 2000 },
        { duration: '1m', target: 2000 },
        { duration: '30s', target: 500 },
      ],
      startTime: '12m',
      exec: 'defaultScenario',
    },
    // Stage 4: Cool-down (should trigger scale-down)
    cooldown: {
      executor: 'ramping-vus',
      startVUs: 500,
      stages: [
        { duration: '5m', target: 0 },
      ],
      startTime: '14m',
      gracefulRampDown: '30s',
      exec: 'defaultScenario',
    },
  },
  thresholds: {
    http_req_duration: ['p(95)<500'],  // 95% of requests under 500ms
    errors: ['rate<0.01'],              // Error rate under 1%
    http_req_failed: ['rate<0.01'],     // Failed requests under 1%
  },
};

const BASE_URL = __ENV.TARGET_URL || 'https://api.example.com';

export function defaultScenario() {
  // Simulate realistic API usage
  const endpoints = [
    { path: '/api/users', weight: 0.5 },
    { path: '/api/orders', weight: 0.3 },
    { path: '/api/products', weight: 0.2 },
  ];

  const random = Math.random();
  let cumulative = 0;
  let selectedEndpoint = endpoints[0].path;

  for (const endpoint of endpoints) {
    cumulative += endpoint.weight;
    if (random <= cumulative) {
      selectedEndpoint = endpoint.path;
      break;
    }
  }

  const response = http.get(`${BASE_URL}${selectedEndpoint}`);

  const success = check(response, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });

  errorRate.add(!success);

  // Realistic think time
  sleep(Math.random() * 2 + 1);
}

export function handleSummary(data) {
  return {
    'results/summary.json': JSON.stringify(data, null, 2),
    stdout: textSummary(data, { indent: ' ', enableColors: true }),
  };
}

K6 with CloudWatch Integration

// tests/k6-cloudwatch.js
import http from 'k6/http';
import { check } from 'k6';
import { AWSConfig, CloudWatchClient } from 'https://jslib.k6.io/aws/0.11.0/cloudwatch.js';

const awsConfig = new AWSConfig({
  region: __ENV.AWS_REGION || 'us-east-1',
  accessKeyId: __ENV.AWS_ACCESS_KEY_ID,
  secretAccessKey: __ENV.AWS_SECRET_ACCESS_KEY,
});

const cloudwatch = new CloudWatchClient(awsConfig);

export const options = {
  scenarios: {
    load_test: {
      executor: 'ramping-vus',
      stages: [
        { duration: '5m', target: 500 },
        { duration: '10m', target: 500 },
        { duration: '5m', target: 0 },
      ],
    },
  },
};

export default function () {
  const response = http.get(__ENV.TARGET_URL);

  check(response, {
    'status is 200': (r) => r.status === 200,
  });

  // Push custom metrics to CloudWatch
  cloudwatch.putMetricData({
    Namespace: 'K6/LoadTest',
    MetricData: [
      {
        MetricName: 'ResponseTime',
        Value: response.timings.duration,
        Unit: 'Milliseconds',
        Dimensions: [
          { Name: 'Endpoint', Value: response.url },
        ],
      },
    ],
  });
}

Locust for Scalability Testing

Distributed Load Test Configuration

# locustfile.py
from locust import HttpUser, task, between, events
from locust.runners import MasterRunner
import time
import logging

class WebsiteUser(HttpUser):
    wait_time = between(1, 3)

    def on_start(self):
        """Initialize user session."""
        self.client.headers = {'Content-Type': 'application/json'}

    @task(5)
    def browse_products(self):
        """70% of traffic - Browse products."""
        with self.client.get("/api/products", catch_response=True) as response:
            if response.status_code == 200:
                response.success()
            else:
                response.failure(f"Got status {response.status_code}")

    @task(2)
    def view_product_detail(self):
        """20% of traffic - View product details."""
        product_id = self.get_random_product_id()
        self.client.get(f"/api/products/{product_id}")

    @task(1)
    def checkout_flow(self):
        """10% of traffic - Full checkout flow."""
        # Add to cart
        self.client.post("/api/cart", json={
            "product_id": self.get_random_product_id(),
            "quantity": 1
        })

        # Checkout
        with self.client.post("/api/checkout", json={
            "payment_method": "card"
        }, catch_response=True) as response:
            if response.status_code in [200, 201]:
                response.success()
            elif response.status_code == 503:
                response.failure("Service unavailable - scaling issue?")

    def get_random_product_id(self):
        import random
        return random.randint(1, 1000)


# Custom metrics for scaling analysis
@events.request.add_listener
def track_response_time(request_type, name, response_time, response_length, **kwargs):
    if response_time > 1000:  # Log slow requests
        logging.warning(f"Slow request: {name} took {response_time}ms")


# Report generation
@events.test_stop.add_listener
def on_test_stop(environment, **kwargs):
    if isinstance(environment.runner, MasterRunner):
        print("\n=== Auto-Scaling Analysis ===")
        stats = environment.runner.stats
        print(f"Total requests: {stats.total.num_requests}")
        print(f"Failure rate: {stats.total.fail_ratio:.2%}")
        print(f"Average response time: {stats.total.avg_response_time:.0f}ms")
        print(f"95th percentile: {stats.total.get_response_time_percentile(0.95):.0f}ms")

Kubernetes Deployment for Distributed Locust

# locust-master.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: locust-master
spec:
  replicas: 1
  selector:
    matchLabels:
      app: locust
      role: master
  template:
    metadata:
      labels:
        app: locust
        role: master
    spec:
      containers:

        - name: locust
          image: locustio/locust:2.20.0
          args:

            - --master
            - -f
            - /mnt/locust/locustfile.py
            - --host
            - $(TARGET_HOST)
          env:

            - name: TARGET_HOST
              valueFrom:
                configMapKeyRef:
                  name: locust-config
                  key: target_host
          ports:

            - containerPort: 8089
            - containerPort: 5557
          volumeMounts:

            - name: locust-scripts
              mountPath: /mnt/locust
      volumes:

        - name: locust-scripts
          configMap:
            name: locust-scripts
---
# locust-worker.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: locust-worker
spec:
  replicas: 10  # 10 workers for distributed testing
  selector:
    matchLabels:
      app: locust
      role: worker
  template:
    metadata:
      labels:
        app: locust
        role: worker
    spec:
      containers:

        - name: locust
          image: locustio/locust:2.20.0
          args:

            - --worker
            - --master-host=locust-master
            - -f
            - /mnt/locust/locustfile.py
          volumeMounts:

            - name: locust-scripts
              mountPath: /mnt/locust
      volumes:

        - name: locust-scripts
          configMap:
            name: locust-scripts

Terraform for Load Test Infrastructure

Ephemeral Load Test Environment

# modules/load-test-infra/main.tf

variable "run_load_test" {
  description = "Set to true to provision load test infrastructure"
  type        = bool
  default     = false
}

variable "worker_count" {
  description = "Number of K6/Locust workers"
  type        = number
  default     = 5
}

# ECS cluster for load generators
resource "aws_ecs_cluster" "load_test" {
  count = var.run_load_test ? 1 : 0
  name  = "load-test-cluster"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }

  tags = {
    Purpose   = "LoadTesting"
    AutoClean = "true"
  }
}

# K6 task definition
resource "aws_ecs_task_definition" "k6" {
  count  = var.run_load_test ? 1 : 0
  family = "k6-load-test"

  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 2048
  memory                   = 4096
  execution_role_arn       = aws_iam_role.ecs_execution[0].arn

  container_definitions = jsonencode([
    {
      name  = "k6"
      image = "grafana/k6:latest"
      command = [
        "run",
        "--out", "cloud",
        "/scripts/load-test.js"
      ]
      environment = [
        {
          name  = "TARGET_URL"
          value = var.target_url
        },
        {
          name  = "K6_CLOUD_TOKEN"
          value = var.k6_cloud_token
        }
      ]
      mountPoints = [
        {
          sourceVolume  = "scripts"
          containerPath = "/scripts"
        }
      ]
      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = "/ecs/k6-load-test"
          "awslogs-region"        = var.aws_region
          "awslogs-stream-prefix" = "k6"
        }
      }
    }
  ])

  volume {
    name = "scripts"
    efs_volume_configuration {
      file_system_id = aws_efs_file_system.scripts[0].id
    }
  }
}

# Run load test as ECS service
resource "aws_ecs_service" "k6_workers" {
  count           = var.run_load_test ? 1 : 0
  name            = "k6-workers"
  cluster         = aws_ecs_cluster.load_test[0].id
  task_definition = aws_ecs_task_definition.k6[0].arn
  desired_count   = var.worker_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets         = var.private_subnet_ids
    security_groups = [aws_security_group.load_test[0].id]
  }
}

output "load_test_status" {
  value = var.run_load_test ? "Load test infrastructure deployed with ${var.worker_count} workers" : "Load test infrastructure not deployed"
}

CI/CD Integration

GitHub Actions Load Test Workflow

name: Scalability Testing

on:
  schedule:

    - cron: '0 4 * * 1'  # Weekly Monday 4 AM
  workflow_dispatch:
    inputs:
      test_duration:
        description: 'Test duration (e.g., 10m, 1h)'
        default: '20m'
      max_vus:
        description: 'Maximum virtual users'
        default: '1000'

jobs:
  load-test:
    runs-on: ubuntu-latest
    environment: load-test

    steps:

      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.LOAD_TEST_ROLE_ARN }}
          aws-region: us-east-1

      - name: Setup K6
        run: |
          sudo gpg -k
          sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
            --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" \
            | sudo tee /etc/apt/sources.list.d/k6.list
          sudo apt-get update && sudo apt-get install k6

      - name: Record baseline metrics
        id: baseline
        run: |
          # Get current ASG instance count
          INSTANCE_COUNT=$(aws autoscaling describe-auto-scaling-groups \
            --auto-scaling-group-names ${{ vars.ASG_NAME }} \
            --query 'AutoScalingGroups[0].Instances | length(@)')
          echo "baseline_instances=$INSTANCE_COUNT" >> $GITHUB_OUTPUT

      - name: Run load test
        run: |
          k6 run tests/autoscaling-validation.js \
            --env TARGET_URL=${{ vars.TARGET_URL }} \
            --duration ${{ inputs.test_duration || '20m' }} \
            --vus ${{ inputs.max_vus || '1000' }} \
            --out json=results/k6-results.json

      - name: Analyze scaling behavior
        run: |
          # Get scaling events during test
          aws autoscaling describe-scaling-activities \
            --auto-scaling-group-name ${{ vars.ASG_NAME }} \
            --max-items 20 \
            --query 'Activities[?StartTime>=`'$(date -d '30 minutes ago' -Iseconds)'`]' \
            > results/scaling-events.json

          python3 scripts/analyze-scaling.py \
            --k6-results results/k6-results.json \
            --scaling-events results/scaling-events.json \
            --output results/analysis.md

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: load-test-results
          path: results/

      - name: Check thresholds
        run: |
          # Fail if scaling was too slow or error rate too high
          python3 scripts/check-thresholds.py \
            --results results/k6-results.json \
            --max-p95-latency 500 \
            --max-error-rate 0.01 \
            --max-scale-time 180

Measuring Success

MetricTargetHow to Track
Scale-up latency< 3 minutes from triggerCloudWatch ASG metrics
P95 latency during scale< 500msK6/Locust results
Error rate during spike< 1%K6/Locust results
Scale-down accuracyWithin 2x cooldown periodASG activity logs
Cost efficiencyNo over-provisioningAWS Cost Explorer

Warning signs your scalability testing isn’t working:

  • Tests pass but production still has scaling issues
  • Scaling policies never trigger during tests (load too low)
  • Test environment doesn’t match production (different instance types, limits)
  • Results vary wildly between test runs (inconsistent baseline)

Conclusion

Effective infrastructure scalability testing requires realistic scenarios and proper tooling:

  1. Test three scenarios minimum: sustained load, spike, and recovery
  2. Use ephemeral infrastructure with Terraform for cost-effective testing
  3. Integrate with CI/CD for regular validation
  4. Correlate metrics between load test results and infrastructure scaling
  5. Document thresholds and alert when tests fail

The key insight: auto-scaling policies need validation under real load conditions. Theoretical calculations aren’t enough—test your infrastructure before your users do.

Official Resources

See Also