What's Actually Happening

You run terraform plan expecting "No changes" but instead see resources being modified, recreated, or destroyed. This phantom drift occurs when Terraform detects differences between your state and configuration, or between state and real infrastructure, even though you haven't intentionally changed anything.

The Error You'll See

``` Terraform will perform the following actions:

# aws_instance.example will be updated in-place ~ resource "aws_instance" "example" { ~ ami = "ami-0c55b159cbfafe1f0" -> "ami-1a2b3c4d" ~ instance_state = "running" -> "stopped" id = "i-0123456789abcdef0" # (10 unchanged attributes hidden) }

# aws_s3_bucket.data will be updated ~ resource "aws_s3_bucket" "data" { ~ versioning { ~ enabled = false -> true } }

Plan: 0 to add, 2 to change, 0 to destroy. ```

Or sometimes completely unexpected replacements:

``` # aws_lb.target must be replaced -/+ resource "aws_lb" "target" { ~ access_logs { ~ bucket = "my-logs-bucket" -> "my-logs-bucket-new" # forces replacement } }

Plan: 1 to add, 0 to change, 1 to destroy. ```

Why This Happens

Unexpected plan changes occur due to:

  1. 1.Provider version upgrades - New provider versions handle attributes differently
  2. 2.API response changes - Cloud provider updated their API default values
  3. 3.Computed attribute drift - Provider-calculated values differ from state
  4. 4.External modifications - Someone changed resources outside Terraform
  5. 5.State not refreshed - State contains stale values from previous runs
  6. 6.Terraform version upgrade - Core version changed attribute handling
  7. 7.Module updates - Upstream module changed default values
  8. 8.JSON/map ordering - Non-deterministic ordering of map keys

Step 1: Investigate the Specific Drift

Understand exactly what's changing:

```bash # Get detailed plan with full attributes terraform plan -out=tfplan terraform show -json tfplan | jq '.resource_changes[]'

# Focus on specific resource terraform plan -target=aws_instance.example

# Show current state values terraform state show aws_instance.example

# Compare with actual resource via API aws ec2 describe-instances --instance-ids i-0123456789abcdef0 --output json ```

Extract the exact differences:

```bash # Get before and after values terraform show -json tfplan | jq '.resource_changes[] | select(.address == "aws_instance.example") | .change'

# Show just changed attributes terraform show -json tfplan | jq '.resource_changes[] | .change.actions' ```

Step 2: Check for External Modifications

Verify whether resources were modified outside Terraform:

```bash # AWS CloudTrail - who changed this resource? aws cloudtrail lookup-events \ --lookup-attributes AttributeKey=ResourceName,AttributeValue=i-0123456789abcdef0 \ --max-items 20 \ --output table

# Check for recent console changes aws cloudtrail lookup-events \ --lookup-attributes AttributeKey=EventSource,AttributeValue=ec2.amazonaws.com \ --start-time $(date -u -d '1 day ago' +%Y-%m-%dT%H:%M:%SZ) \ --output table

# Azure Activity Log az monitor activity-log list \ --resource-group my-rg \ --caller unknown \ --output table

# GCP Audit Logs gcloud logging read "resource.type=gce_instance AND protoPayload.resourceName=~instances/example" ```

Ask your team: "Did anyone manually modify production resources recently?"

Step 3: Refresh State Against Reality

Force state to sync with actual infrastructure:

```bash # Standard refresh terraform refresh

# Or use refresh-only plan mode (Terraform 1.4+) terraform plan -refresh-only

# Check if drift resolved terraform plan ```

If refresh doesn't help:

```bash # Pull state manually terraform state pull > current-state.json

# Compare with API aws ec2 describe-instances --instance-ids i-0123456789abcdef0 > actual-state.json

# Look for differences diff current-state.json actual-state.json ```

Step 4: Handle Provider Default Changes

When provider upgrades change default handling:

```bash # Check your provider versions terraform providers

# Check version in lock file cat .terraform.lock.hcl | grep "provider_registry"

# See what changed in provider release # Check provider changelog on GitHub ```

Pin provider versions to avoid unexpected upgrades:

```hcl terraform { required_providers { aws = { source = "hashicorp/aws" version = "5.0.0" # Pin exact version } } }

provider "aws" { region = "us-east-1"

# Explicitly set values that might have new defaults default_tags { tags = { ManagedBy = "terraform" } } } ```

Step 5: Use Lifecycle Rules to Ignore Drift

For attributes that legitimately fluctuate:

```hcl resource "aws_instance" "example" { ami = var.ami_id instance_type = "t3.micro"

lifecycle { ignore_changes = [ ami, # AMI might auto-update user_data, # Base64 encoding might differ root_block_device, # Volume modifications outside Terraform tags["LastModified"], # External tag updates ] } }

resource "aws_security_group" "main" { name = "main-sg"

lifecycle { # Ignore rules added by AWS or other systems ignore_changes = [ingress, egress]

# Or create_before_destroy for replacements create_before_destroy = true } } ```

Step 6: Fix Computed Attribute Issues

Some attributes are provider-computed and cause false drift:

```bash # Identify computed attributes terraform state show aws_instance.example | grep -i computed

# Check provider documentation for computed vs configurable ```

Handle computed vs configurable conflicts:

```hcl resource "aws_s3_bucket" "data" { bucket = "my-data-bucket"

# Some attributes need explicit values to avoid drift versioning { enabled = false # Explicit, not relying on default }

# For computed hash values lifecycle { ignore_changes = [ # Ignore computed checksums ] } } ```

Step 7: Fix JSON and Map Ordering Issues

For non-deterministic ordering:

bash
# Terraform normalizes JSON/map ordering
# This can cause apparent changes with no actual difference
terraform plan -out=tfplan
terraform show tfplan | grep -A5 "changed"

Convert to deterministic types:

```hcl # Use explicit lists instead of maps for ordering-dependent items variable "subnet_cidrs" { type = list(string) # Deterministic order default = ["10.0.1.0/24", "10.0.2.0/24"] }

# For JSON content, use jsonencode with sorting resource "aws_s3_object" "config" { bucket = "my-bucket" key = "config.json" content = jsonencode({ # Sorted keys for consistency setting1 = "value1" setting2 = "value2" }) } ```

Step 8: Fix State Attribute Mismatches

Directly fix state when it contains wrong values:

```bash # Remove resource with incorrect state terraform state rm aws_instance.example

# Re-import with correct values terraform import aws_instance.example i-0123456789abcdef0

# For specific attributes, use state editing terraform state pull > state.json # Edit the JSON manually (advanced, risky) terraform state push state.json ```

Step 9: Implement Proactive Drift Detection

Catch drift before it surprises you:

```yaml # GitHub Actions drift detection workflow name: Drift Detection on: schedule: - cron: '0 6 * * *' # Daily 6 AM

jobs: drift: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - uses: hashicorp/setup-terraform@v2

  • name: Terraform Init
  • run: terraform init
  • name: Check for Drift
  • id: plan
  • run: |
  • terraform plan -detailed-exitcode -out=tfplan
  • continue-on-error: true
  • name: Alert on Drift
  • if: steps.plan.outcome == 'failure'
  • run: |
  • echo "::warning::Infrastructure drift detected!"
  • terraform show tfplan
  • # Send Slack/email notification
  • `

Verify the Fix

After addressing drift:

```bash # Confirm plan is clean terraform plan

# Should output: No changes. Infrastructure matches configuration.

# Run multiple times for consistency terraform plan terraform plan # Second run should be identical ```

Verify state matches reality:

bash
terraform plan -refresh-only
# Should show no changes

Prevention Best Practices

Always pin provider versions:

```hcl terraform { required_version = "~> 1.5"

required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } ```

Use lifecycle rules strategically:

hcl
# Only ignore changes that legitimately vary
lifecycle {
  ignore_changes = [
    # Document why each attribute is ignored
    tags["UpdatedAt"],  # Auto-updated by external system
  ]
}

Lock file management:

```bash # Commit lock file for consistency git add .terraform.lock.hcl git commit -m "Update provider lock"

# Team members use same versions terraform init ```