Introduction

Terraform state drift detection mismatch occurs when the actual infrastructure state differs from the state file, causing terraform plan to show unexpected changes. Drift happens when resources are modified outside Terraform (manual console changes, API calls, other automation tools) or when the state file becomes inconsistent due to concurrent runs, failed applies, or backend synchronization issues. Uncorrected drift leads to deployment failures, resource recreation, or production outages.

Symptoms

  • terraform plan shows changes for resources that should be unchanged
  • terraform apply attempts to recreate existing resources
  • State lock errors during plan/apply operations
  • Remote backend shows different state than local terraform.tfstate
  • CI/CD pipeline fails with "state locked by another process" errors
  • Issue appears after manual hotfix, emergency console change, or concurrent pipeline runs

Common Causes

  • Manual infrastructure changes via cloud console or CLI
  • Another Terraform run modified state concurrently
  • State file corruption during failed apply or network interruption
  • Remote backend (S3, Azure Blob, GCS) synchronization delays
  • Different Terraform versions with incompatible state formats
  • Module or provider version changes altering resource schemas
  • Imported resources with incomplete state mapping

Step-by-Step Fix

### 1. Detect drift with terraform plan

Run plan to identify drifted resources:

```bash # Generate plan and save to file terraform plan -out=tfplan

# Review planned changes terraform show -json tfplan | jq '.resource_changes[] | select(.change.actions != ["no-op"]) | {address, actions}'

# Or use drift detection flag terraform plan -detailed-exitcode # Exit code 2 = changes detected, 1 = error, 0 = no changes ```

For comprehensive drift report:

```bash # TDrift (third-party tool) tdrift --plan-out=tfplan --format=json

# Or parse terraform show output terraform show -no-color tfplan > drift-report.txt ```

### 2. Check state lock status

Verify if state is locked by another process:

```bash # Terraform shows lock info automatically terraform plan

# Example lock error: # Error: Error locking state: Error acquiring the state lock # Lock Info: # ID: 1a2b3c4d-5e6f-7g8h-9i0j # Path: s3://bucket/terraform/prod/terraform.tfstate # Operation: OperationTypeApply # Who: user@host # Version: 1.5.7 # Created: 2026-03-30 10:00:00.000 +0000 UTC

# Force unlock (use with caution - ensure no actual running apply) terraform force-unlock 1a2b3c4d-5e6f-7g8h-9i0j ```

Before force-unlocking: - Verify no CI/CD pipeline is actually running - Check cloud provider for active API operations - Confirm with team that no manual Terraform is running

### 3. Refresh state from cloud provider

Sync state file with actual infrastructure:

```bash # Refresh state without making changes terraform refresh

# Or combine with plan terraform plan -refresh-only

# For specific resource terraform apply -refresh-only -target=aws_instance.web ```

After refresh, review changes:

```bash # See what changed in state terraform show -json terraform.tfstate | jq '.values.root_module.resources[] | {address, values}'

# Compare with previous state (if versioned) aws s3 cp s3://bucket/terraform/prod/terraform.tfstate-version-42 . diff terraform.tfstate terraform.tfstate-version-42 | head -50 ```

### 4. Handle manually changed resources

For resources modified outside Terraform, choose reconciliation strategy:

**Option A: Accept infrastructure changes (import to state)**

```bash # If resource was created manually, import it terraform import aws_instance.web i-0abc123def456

# Update configuration to match actual state # Edit main.tf to match the imported resource attributes ```

**Option B: Revert infrastructure to match state**

```bash # Apply to overwrite manual changes terraform apply -target=aws_instance.web

# Warning: This may destroy and recreate resources ```

**Option C: Update state to match infrastructure**

bash # Use terraform state commands terraform state show aws_instance.web terraform state rm aws_instance.web terraform import aws_instance.web i-0abc123def456

### 5. Resolve concurrent run conflicts

Prevent multiple Terraform runs from corrupting state:

```bash # Check backend configuration for locking support terraform backend show

# S3 backend with DynamoDB locking (recommended) terraform { backend "s3" { bucket = "my-terraform-state" key = "prod/terraform.tfstate" region = "us-east-1" encrypt = true dynamodb_table = "terraform-locks" # Enables state locking } }

# Azure Blob backend (has built-in locking) terraform { backend "azurerm" { resource_group_name = "terraform-rg" storage_account_name = "tfstate" container_name = "tfstate" key = "prod/terraform.tfstate" } } ```

CI/CD pipeline serialization:

yaml # GitHub Actions - ensure single run jobs: terraform: runs-on: ubuntu-latest concurrency: group: terraform-${{ github.ref }} cancel-in-progress: false # Don't cancel mid-apply

### 6. Handle state version conflicts

Remote backends may have version conflicts:

```bash # Pull latest state from remote terraform state pull > remote-state.tfstate

# Compare with local state diff terraform.tfstate remote-state.tfstate

# If remote is authoritative, overwrite local terraform state pull > terraform.tfstate

# If local should win, push to remote (dangerous) terraform state push terraform.tfstate ```

State push with force (use only when certain):

```bash # Force push local state to remote terraform state push -force terraform.tfstate

# Backup first terraform state pull > backup-$(date +%Y%m%d-%H%M%S).tfstate ```

### 7. Fix corrupted or inconsistent state

Repair state file issues:

```bash # Validate state file syntax terraform validate

# Check for orphaned resources terraform state list

# Remove resource from state (does not destroy infrastructure) terraform state rm aws_instance.web

# Move resource to different address (after refactoring) terraform state mv aws_instance.web aws_instance.web_server

# Import missing resource terraform import aws_instance.web i-0abc123def456 ```

For severe corruption:

```bash # Export state to JSON for manual repair terraform state pull > state.json

# After fixing JSON, push back cat state-fixed.json | terraform state push - ```

### 8. Implement drift detection automation

Schedule regular drift detection:

```yaml # GitHub Actions - daily drift check name: Terraform Drift Detection on: schedule: - cron: '0 6 * * *' # Daily at 6 AM

jobs: drift: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: hashicorp/setup-terraform@v3 - run: terraform init - run: terraform plan -out=tfplan -detailed-exitcode continue-on-error: true id: plan - name: Report drift if: steps.plan.outcome == 'failure' run: | terraform show -json tfplan | jq '.resource_changes' > drift-report.json # Send to Slack, email, or create GitHub issue ```

### 9. Configure state backend versioning

Enable state versioning for rollback capability:

```bash # S3 bucket versioning aws s3api put-bucket-versioning \ --bucket my-terraform-state \ --versioning-configuration Status=Enabled

# List state versions aws s3api list-object-versions \ --bucket my-terraform-state \ --prefix terraform/prod/terraform.tfstate

# Restore previous version aws s3api get-object \ --bucket my-terraform-state \ --key terraform/prod/terraform.tfstate \ --version-id VERSION_ID \ terraform.tfstate ```

### 10. Audit state access and changes

Track who modified state and when:

```bash # S3 bucket access logs aws s3api get-bucket-logging \ --bucket my-terraform-state

# CloudTrail for API calls aws cloudtrail lookup-events \ --lookup-attributes AttributeKey=EventName,AttributeValue=PutObject \ --start-time $(date -d '1 hour ago' -Iseconds)

# Terraform Cloud/Enterprise audit log # https://app.terraform.io/app/ORGANIZATION/settings/audit-trail ```

Prevention

  • Never make manual infrastructure changes in production
  • Use terraform import for existing resources before managing
  • Enable state locking with DynamoDB (S3) or built-in (Azure, GCS)
  • Configure CI/CD with concurrency limits for Terraform jobs
  • Enable bucket versioning for state file rollback
  • Run terraform plan in PR review before any apply
  • Schedule regular drift detection scans
  • Use prevent_destroy lifecycle rule for critical resources

```hcl # Protect critical resources from accidental destruction resource "aws_rds_cluster" "main" { # ... configuration ...

lifecycle { prevent_destroy = true } } ```

  • **Error acquiring the state lock**: Another Terraform process holds lock
  • **Backend configuration changed**: Remote backend reconfigured
  • **Resource no longer exists**: Infrastructure deleted outside Terraform