Drift Detection in Infrastructure: Keeping Terraform State in Sync

Detect and fix infrastructure drift before it causes incidents. Learn driftctl, terraform plan strategies, and automated drift monitoring for Terraform-managed resources.

TL;DR
Drift happens when real infrastructure diverges from Terraform state — manual changes, console edits, or failed applies
driftctl scans your cloud account and compares against state, catching resources Terraform doesn’t know about
The #1 mistake: assuming terraform plan catches all drift (it only checks resources in state)
Best for: Teams with multiple people accessing cloud consoles or inherited infrastructure Skip if: You’re solo, all changes go through Terraform, and you never touch the console Read time: 10 minutes

Your Terraform state says you have 3 EC2 instances. AWS console shows 7. Someone created 4 instances manually “for testing” six months ago. They’re still running, costing $800/month, and nobody knows what they do.

This is infrastructure drift — the silent divergence between what your IaC defines and what actually exists. In 2026, with teams deploying faster and cloud complexity growing, drift detection isn’t optional. It’s how you maintain control.

The Real Problem

Drift happens through multiple vectors:

Console changes: Someone fixes a security group rule directly in AWS console. Terraform doesn’t know.

Failed applies: terraform apply partially completes before error. State and reality diverge.

Unmanaged resources: Resources created outside Terraform — by other teams, by automation, by you during debugging.

Provider bugs: Cloud provider API returns different state than what was applied. Rare but happens.

The dangerous part: drift is invisible until something breaks. That manually-edited security group? Works fine until Terraform overwrites it on next apply, killing production traffic.

Terraform Plan Isn’t Enough

terraform plan only detects drift for resources already in state. If someone creates an EC2 instance manually, Terraform has no idea it exists.

# This only shows drift for known resources
terraform plan

# Output might show:
# No changes. Your infrastructure matches the configuration.

# But reality: 4 unknown instances running in your VPC

To catch comprehensive drift, you need tools that scan your cloud account independently of Terraform state.

driftctl: The Dedicated Tool

driftctl (now part of Snyk) scans your cloud provider and compares against Terraform state. It finds:

Resources in state but changed in reality (modified)
Resources in cloud but not in state (unmanaged)
Resources in state but deleted from cloud (missing)

# Install
brew install driftctl

# Basic scan (uses AWS credentials from environment)
driftctl scan

# Scan specific state file
driftctl scan --from tfstate://terraform.tfstate

# Scan Terraform Cloud workspace
driftctl scan --from tfstate+tfcloud://WORKSPACE_ID

# Output as JSON for CI integration
driftctl scan --output json://drift-report.json

Sample output:

Found resources not covered by IaC:
  aws_instance:

    - i-0abc123def456789
    - i-0def456789abc123
  aws_security_group:

    - sg-0123456789abcdef0

Found drifted resources:
  aws_s3_bucket.data (id: my-data-bucket)
    ~ versioning.0.enabled: false => true

Found deleted resources:
  aws_iam_role.legacy (id: legacy-role)

Coverage: 87% (142/163 resources)

The coverage metric is key — it shows what percentage of your cloud resources are managed by Terraform.

CI/CD Integration

Run drift detection on schedule, not just on PR:

name: Drift Detection

on:
  schedule:

    - cron: '0 8 * * *'  # Daily at 8 AM UTC
  workflow_dispatch:  # Manual trigger

jobs:
  drift-scan:
    runs-on: ubuntu-latest
    steps:

      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - name: Install driftctl
        run: |
          curl -L https://github.com/snyk/driftctl/releases/latest/download/driftctl_linux_amd64 -o driftctl
          chmod +x driftctl
          sudo mv driftctl /usr/local/bin/

      - name: Run drift scan
        id: drift
        run: |
          driftctl scan --from tfstate://terraform.tfstate --output json://drift.json
          echo "coverage=$(jq -r '.coverage' drift.json)" >> $GITHUB_OUTPUT

      - name: Check coverage threshold
        run: |
          coverage=${{ steps.drift.outputs.coverage }}
          if (( $(echo "$coverage < 80" | bc -l) )); then
            echo "Coverage $coverage% below 80% threshold"
            exit 1
          fi

      - name: Alert on drift
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "Infrastructure drift detected! Coverage: ${{ steps.drift.outputs.coverage }}%"
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Handling Unmanaged Resources

When driftctl finds unmanaged resources, you have options:

Import into Terraform:

# Generate import blocks
driftctl scan --output json://drift.json
cat drift.json | jq -r '.unmanaged[] | "terraform import \(.type).\(.id) \(.id)"'

# Terraform 1.5+ native import blocks
import {
  to = aws_instance.imported
  id = "i-0abc123def456789"
}

Exclude from scanning (for resources intentionally unmanaged):

# .driftignore
aws_iam_policy_attachment  # AWS managed attachments
aws_cloudwatch_log_group:/aws/lambda/*  # Auto-created by Lambda
aws_security_group_rule:*  # Managed by another team

Delete manually-created resources:

# After confirming resource is safe to delete
aws ec2 terminate-instances --instance-ids i-0abc123def456789

Advanced: terraform plan -refresh-only

For resources already in state, Terraform 1.x provides dedicated drift detection:

# Check for drift without planning changes
terraform plan -refresh-only

# Apply only the state refresh (update state to match reality)
terraform apply -refresh-only

# Then plan to see what changes would restore desired state
terraform plan

This workflow separates “what changed in reality” from “what changes would Terraform make.”

Deep Dive Detection with AWS Config

For AWS-specific drift, AWS Config provides continuous monitoring:

resource "aws_config_configuration_recorder" "main" {
  name     = "config-recorder"
  role_arn = aws_iam_role.config.arn

  recording_group {
    all_supported = true
  }
}

resource "aws_config_config_rule" "required_tags" {
  name = "required-tags"

  source {
    owner             = "AWS"
    source_identifier = "REQUIRED_TAGS"
  }

  input_parameters = jsonencode({
    tag1Key = "Environment"
    tag2Key = "ManagedBy"
  })
}

AWS Config detects drift from compliance rules continuously, not just during scans.

AI-Assisted Approaches

Drift remediation often requires judgment calls. AI tools help.

What AI does well:

Analyzing drift reports to prioritize high-risk resources
Generating Terraform import blocks from cloud resource data
Suggesting whether to import, ignore, or delete unmanaged resources
Explaining why a resource might have drifted

What still needs humans:

Deciding whether drift was intentional (emergency fix vs mistake)
Choosing between importing vs recreating drifted resources
Understanding business impact of remediation options
Approving destructive actions (deletes, replacements)

Useful prompt:

driftctl found these unmanaged AWS resources:

- 3 EC2 instances (i-xxx) in us-east-1, t3.medium, no tags
- 2 S3 buckets with "backup" in name
- 1 RDS instance named "temp-db"

Help me:

1. Assess risk level of each
2. Recommend: import to Terraform, delete, or ignore
3. Generate import blocks for resources to keep

When This Breaks Down

Drift detection has limitations:

Resource type coverage: driftctl doesn’t support every AWS/Azure/GCP resource. New services lag.

State file access: Remote state backends (S3, Terraform Cloud) require proper authentication. Multi-workspace setups get complex.

Performance at scale: Scanning accounts with 10,000+ resources takes time. Consider filtering by resource type.

False positives: Some resources naturally drift (auto-scaling counts, dynamic IPs). You’ll accumulate ignores.

Consider complementary approaches:

Policy as Code to prevent unauthorized changes
Cloud provider native tools (AWS Config, Azure Policy) for continuous compliance
GitOps workflows that make manual changes impossible

Decision Framework

Run drift detection daily when:

Multiple teams access cloud consoles
You inherited infrastructure from another team
Compliance requires proof of configuration management
You’ve had drift-related incidents

Run drift detection weekly when:

Small team with strong Terraform discipline
Most changes go through CI/CD
Low rate of manual interventions

Skip drift detection when:

Solo developer with full console discipline
Ephemeral environments (recreated frequently)
Resources explicitly managed outside Terraform

Measuring Success

Metric	Before	After	How to Track
Unmanaged resources	Unknown	<10%	driftctl coverage
Drift incidents per quarter	Unknown	0	Incident reports
Time to detect drift	Days/weeks	<24 hours	Scan timestamps
Resources imported to IaC	N/A	+50/quarter	Git history

Warning signs it’s not working:

Growing .driftignore file
Teams disabling drift alerts
Coverage percentage dropping
Manual changes continuing despite detection

What’s Next

Start with visibility, then improve coverage:

Run driftctl scan once manually to establish baseline
Document all unmanaged resources (decide: import, delete, ignore)
Set up daily scheduled scans
Alert on new unmanaged resources
Track coverage percentage over time (target: 90%+)
Add driftctl to PR checks for state-changing workflows

The goal is making drift visible immediately, not during incident investigation.

Related articles:

External resources: