What's Actually Happening

Infrastructure drift occurs when your actual cloud resources differ from what Terraform's state file records. This can happen when resources are modified outside Terraform (manually, via console, or by other tools), or when Terraform state becomes stale. Detecting and resolving drift is critical for maintaining infrastructure integrity.

The Error You'll See

During plan:

``` Terraform detected the following changes made outside of Terraform:

# aws_instance.web has been changed ~ resource "aws_instance" "web" { ~ instance_type = "t3.micro" -> "t3.large" + tags = { + "ManualEdit" = "true" } id = "i-0123456789abcdef0" }

# aws_security_group.web was deleted - resource "aws_security_group" "web" { id = "sg-12345678" # Resource was deleted outside Terraform }

Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using lifecycle, this plan will attempt to revert these changes. ```

Or drift detected:

``` Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply":

# aws_db_instance.production has been deleted # aws_s3_bucket.data has been created ```

Why This Happens

Drift occurs due to:

  1. 1.Manual console edits - Someone modified resources via cloud console
  2. 2.Emergency changes - Quick fixes made without Terraform
  3. 3.Other automation - Different tools modifying same resources
  4. 4.Cloud provider changes - Automatic updates or migrations by provider
  5. 5.State refresh failure - Terraform didn't update state properly
  6. 6.Resource deletion - Resources deleted outside Terraform
  7. 7.IAM/permission changes - Someone modified permissions manually
  8. 8.Scaling events - Auto-scaling modified resources

Step 1: Run Drift Detection

Check for drift:

```bash # Standard plan shows drift terraform plan

# Refresh-only mode to see drift without changes terraform plan -refresh-only

# This shows only what Terraform would update in state # without proposing infrastructure changes ```

Get detailed drift information:

```bash # JSON output for detailed analysis terraform plan -refresh-only -out=tfplan terraform show -json tfplan | jq '.resource_changes[] | select(.change.actions != ["no-op"])'

# Or use refresh command terraform refresh

# Then check plan terraform plan ```

Step 2: Investigate Source of Drift

Determine who or what caused the drift:

```bash # AWS CloudTrail - find who made changes aws cloudtrail lookup-events \ --lookup-attributes AttributeKey=ResourceName,AttributeValue=i-0123456789abcdef0 \ --max-items 20 \ --output table

# Filter by time range aws cloudtrail lookup-events \ --start-time 2026-04-01T00:00:00Z \ --end-time 2026-04-04T00:00:00Z \ --lookup-attributes AttributeKey=EventSource,AttributeValue=ec2.amazonaws.com

# Azure Activity Log az monitor activity-log list \ --resource-group my-rg \ --caller unknown \ --output table

# GCP Audit Logs gcloud logging read "resource.type=gce_instance" \ --project my-project \ --freshness 7d ```

Check for automation sources:

```bash # Check for other CI/CD pipelines gh run list --workflow=infrastructure.yml

# Check for scheduled jobs kubectl get cronjobs -n infrastructure

# Ask team members # "Did anyone manually edit production resources recently?" ```

Step 3: Determine Resolution Strategy

Options for handling drift:

Option 1: Accept Terraform's version (revert drift) ```bash # Terraform will revert external changes terraform apply

# Resources return to Terraform-defined state ```

Option 2: Accept external changes (update Terraform) ```bash # Update Terraform config to match actual state # Edit main.tf to reflect new instance_type terraform apply

# State now matches infrastructure and config ```

Option 3: Import/recreate missing resources ```bash # Import externally created resources terraform import aws_s3_bucket.new_bucket new-bucket-name

# Recreate deleted resources terraform apply -target=aws_security_group.web ```

Option 4: Ignore certain drift ``hcl # Use lifecycle ignore_changes resource "aws_instance" "web" { lifecycle { ignore_changes = [ instance_type, # Allow manual scaling tags["ManualEdit"], # Allow external tag additions ] } }

Step 4: Handle Resource Deletion Drift

When resources are deleted outside Terraform:

```bash # Terraform will show deleted resource terraform plan

# You'll see: # - aws_security_group.web will be recreated

# Decide to recreate terraform apply

# Or if you don't want it, remove from state and config terraform state rm aws_security_group.web # Remove from configuration terraform apply ```

Step 5: Handle Resource Creation Drift

When resources are created outside Terraform:

```bash # Terraform doesn't know about new resources # You need to import them

# First, create matching configuration resource "aws_s3_bucket" "new_bucket" { bucket = "new-bucket-name" }

# Import the existing resource terraform import aws_s3_bucket.new_bucket new-bucket-name

# Verify configuration matches terraform plan # Should show no changes ```

Bulk import new resources:

```bash # Find all unmanaged resources aws resourcegroupstaggingapi get-resources \ --tag-filters Key=ManagedBy,Values="" # Not tagged as Terraform

# Import each for bucket in $(aws s3 ls | awk '{print $3}'); do terraform import aws_s3_bucket.$bucket $bucket done ```

Step 6: Handle Attribute Drift

When resource attributes differ:

```bash # Instance type changed manually # Terraform shows: # instance_type: "t3.micro" -> "t3.large"

# Option A: Update Terraform config # Edit main.tf instance_type = "t3.large" terraform apply

# Option B: Revert to original terraform apply # Changes back to t3.micro

# Option C: Ignore the drift lifecycle { ignore_changes = [instance_type] } ```

Step 7: Implement Proactive Drift Detection

Scheduled drift detection:

```yaml # GitHub Actions - daily drift check name: Drift Detection on: schedule: - cron: '0 6 * * *' # Daily 6 AM

jobs: drift: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - uses: hashicorp/setup-terraform@v2

  • name: Terraform Init
  • run: terraform init
  • name: Check for Drift
  • id: drift
  • run: |
  • terraform plan -detailed-exitcode -out=tfplan
  • continue-on-error: true
  • name: Report Drift
  • if: steps.drift.outcome == 'failure'
  • run: |
  • terraform show -json tfplan > drift-report.json
  • # Send Slack notification
  • curl -X POST $SLACK_WEBHOOK \
  • -d '{"text":"Drift detected in production infrastructure"}'
  • `

Terraform Cloud drift detection:

```hcl # In Terraform Cloud settings # Enable "Drift Detection" runs # Runs automatically on schedule

# Workspace settings: # Settings > Runs > Run triggers # Schedule daily drift detection ```

Step 8: Fix State Corruption Drift

When state itself is corrupted:

```bash # State shows different values than actual resources terraform state show aws_instance.web

# Compare with actual aws ec2 describe-instances --instance-ids i-0123456789abcdef0

# If state is wrong, refresh terraform refresh

# Or manually correct terraform state rm aws_instance.web terraform import aws_instance.web i-0123456789abcdef0 ```

Step 9: Document Drift Policy

Create drift handling policy:

```markdown ## Infrastructure Drift Policy

Detection - Daily automated drift detection runs at 6 AM - Drift alerts sent to #infrastructure Slack channel - On-call engineer investigates within 24 hours

Resolution 1. Identify source of drift via CloudTrail 2. Determine if drift was intentional 3. Intentional: Update Terraform config to match 4. Unintentional: Apply Terraform to revert 5. Document resolution in change log

Prevention - All changes must go through Terraform - Manual changes only in emergencies - Tag manually modified resources: ManualEdit=true - Review drift report weekly ```

Step 10: Implement Tagging for Drift Prevention

Tag resources to prevent drift confusion:

```hcl # Tag all Terraform-managed resources provider "aws" { default_tags { tags = { ManagedBy = "terraform" TerraformStack = "production" LastModified = timestamp() } } }

resource "aws_instance" "web" { tags = { Name = "web-server" ManagedBy = "terraform" } } ```

Check for unmanaged resources:

bash
# Find resources not tagged as Terraform-managed
aws resourcegroupstaggingapi get-resources \
  --tag-filters Key=ManagedBy,Values="" \
  --output json | jq '.ResourceTagMappingList[].ResourceARN'

Verify the Fix

After resolving drift:

```bash # Run plan to verify terraform plan

# Should show: No changes. Infrastructure matches configuration.

# Run multiple times to ensure consistency terraform plan terraform plan # Second should be identical

# Check state matches reality terraform state pull | jq '.resources[].attributes.id' ```

Prevention Best Practices

Prevent drift from occurring:

```hcl # Use lifecycle rules to prevent unexpected drift lifecycle { prevent_destroy = true # Prevent accidental deletion }

# Document all manual changes # Tag with ManualEdit=true

# Implement change management # All changes must be approved before implementation ```

Require Terraform for all changes:

markdown
## Infrastructure Change Process
1. Create Terraform PR
2. Review by team
3. Approval required
4. Apply via CI/CD
5. Manual changes only in emergencies
6. Document emergency changes immediately