Introduction
Terraform state drift detection mismatch occurs when the actual infrastructure state differs from the state file, causing terraform plan to show unexpected changes. Drift happens when resources are modified outside Terraform (manual console changes, API calls, other automation tools) or when the state file becomes inconsistent due to concurrent runs, failed applies, or backend synchronization issues. Uncorrected drift leads to deployment failures, resource recreation, or production outages.
Symptoms
terraform planshows changes for resources that should be unchangedterraform applyattempts to recreate existing resources- State lock errors during plan/apply operations
- Remote backend shows different state than local
terraform.tfstate - CI/CD pipeline fails with "state locked by another process" errors
- Issue appears after manual hotfix, emergency console change, or concurrent pipeline runs
Common Causes
- Manual infrastructure changes via cloud console or CLI
- Another Terraform run modified state concurrently
- State file corruption during failed apply or network interruption
- Remote backend (S3, Azure Blob, GCS) synchronization delays
- Different Terraform versions with incompatible state formats
- Module or provider version changes altering resource schemas
- Imported resources with incomplete state mapping
Step-by-Step Fix
### 1. Detect drift with terraform plan
Run plan to identify drifted resources:
```bash # Generate plan and save to file terraform plan -out=tfplan
# Review planned changes terraform show -json tfplan | jq '.resource_changes[] | select(.change.actions != ["no-op"]) | {address, actions}'
# Or use drift detection flag terraform plan -detailed-exitcode # Exit code 2 = changes detected, 1 = error, 0 = no changes ```
For comprehensive drift report:
```bash # TDrift (third-party tool) tdrift --plan-out=tfplan --format=json
# Or parse terraform show output terraform show -no-color tfplan > drift-report.txt ```
### 2. Check state lock status
Verify if state is locked by another process:
```bash # Terraform shows lock info automatically terraform plan
# Example lock error: # Error: Error locking state: Error acquiring the state lock # Lock Info: # ID: 1a2b3c4d-5e6f-7g8h-9i0j # Path: s3://bucket/terraform/prod/terraform.tfstate # Operation: OperationTypeApply # Who: user@host # Version: 1.5.7 # Created: 2026-03-30 10:00:00.000 +0000 UTC
# Force unlock (use with caution - ensure no actual running apply) terraform force-unlock 1a2b3c4d-5e6f-7g8h-9i0j ```
Before force-unlocking: - Verify no CI/CD pipeline is actually running - Check cloud provider for active API operations - Confirm with team that no manual Terraform is running
### 3. Refresh state from cloud provider
Sync state file with actual infrastructure:
```bash # Refresh state without making changes terraform refresh
# Or combine with plan terraform plan -refresh-only
# For specific resource terraform apply -refresh-only -target=aws_instance.web ```
After refresh, review changes:
```bash # See what changed in state terraform show -json terraform.tfstate | jq '.values.root_module.resources[] | {address, values}'
# Compare with previous state (if versioned) aws s3 cp s3://bucket/terraform/prod/terraform.tfstate-version-42 . diff terraform.tfstate terraform.tfstate-version-42 | head -50 ```
### 4. Handle manually changed resources
For resources modified outside Terraform, choose reconciliation strategy:
**Option A: Accept infrastructure changes (import to state)**
```bash # If resource was created manually, import it terraform import aws_instance.web i-0abc123def456
# Update configuration to match actual state # Edit main.tf to match the imported resource attributes ```
**Option B: Revert infrastructure to match state**
```bash # Apply to overwrite manual changes terraform apply -target=aws_instance.web
# Warning: This may destroy and recreate resources ```
**Option C: Update state to match infrastructure**
bash
# Use terraform state commands
terraform state show aws_instance.web
terraform state rm aws_instance.web
terraform import aws_instance.web i-0abc123def456
### 5. Resolve concurrent run conflicts
Prevent multiple Terraform runs from corrupting state:
```bash # Check backend configuration for locking support terraform backend show
# S3 backend with DynamoDB locking (recommended) terraform { backend "s3" { bucket = "my-terraform-state" key = "prod/terraform.tfstate" region = "us-east-1" encrypt = true dynamodb_table = "terraform-locks" # Enables state locking } }
# Azure Blob backend (has built-in locking) terraform { backend "azurerm" { resource_group_name = "terraform-rg" storage_account_name = "tfstate" container_name = "tfstate" key = "prod/terraform.tfstate" } } ```
CI/CD pipeline serialization:
yaml
# GitHub Actions - ensure single run
jobs:
terraform:
runs-on: ubuntu-latest
concurrency:
group: terraform-${{ github.ref }}
cancel-in-progress: false # Don't cancel mid-apply
### 6. Handle state version conflicts
Remote backends may have version conflicts:
```bash # Pull latest state from remote terraform state pull > remote-state.tfstate
# Compare with local state diff terraform.tfstate remote-state.tfstate
# If remote is authoritative, overwrite local terraform state pull > terraform.tfstate
# If local should win, push to remote (dangerous) terraform state push terraform.tfstate ```
State push with force (use only when certain):
```bash # Force push local state to remote terraform state push -force terraform.tfstate
# Backup first terraform state pull > backup-$(date +%Y%m%d-%H%M%S).tfstate ```
### 7. Fix corrupted or inconsistent state
Repair state file issues:
```bash # Validate state file syntax terraform validate
# Check for orphaned resources terraform state list
# Remove resource from state (does not destroy infrastructure) terraform state rm aws_instance.web
# Move resource to different address (after refactoring) terraform state mv aws_instance.web aws_instance.web_server
# Import missing resource terraform import aws_instance.web i-0abc123def456 ```
For severe corruption:
```bash # Export state to JSON for manual repair terraform state pull > state.json
# After fixing JSON, push back cat state-fixed.json | terraform state push - ```
### 8. Implement drift detection automation
Schedule regular drift detection:
```yaml # GitHub Actions - daily drift check name: Terraform Drift Detection on: schedule: - cron: '0 6 * * *' # Daily at 6 AM
jobs: drift: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: hashicorp/setup-terraform@v3 - run: terraform init - run: terraform plan -out=tfplan -detailed-exitcode continue-on-error: true id: plan - name: Report drift if: steps.plan.outcome == 'failure' run: | terraform show -json tfplan | jq '.resource_changes' > drift-report.json # Send to Slack, email, or create GitHub issue ```
### 9. Configure state backend versioning
Enable state versioning for rollback capability:
```bash # S3 bucket versioning aws s3api put-bucket-versioning \ --bucket my-terraform-state \ --versioning-configuration Status=Enabled
# List state versions aws s3api list-object-versions \ --bucket my-terraform-state \ --prefix terraform/prod/terraform.tfstate
# Restore previous version aws s3api get-object \ --bucket my-terraform-state \ --key terraform/prod/terraform.tfstate \ --version-id VERSION_ID \ terraform.tfstate ```
### 10. Audit state access and changes
Track who modified state and when:
```bash # S3 bucket access logs aws s3api get-bucket-logging \ --bucket my-terraform-state
# CloudTrail for API calls aws cloudtrail lookup-events \ --lookup-attributes AttributeKey=EventName,AttributeValue=PutObject \ --start-time $(date -d '1 hour ago' -Iseconds)
# Terraform Cloud/Enterprise audit log # https://app.terraform.io/app/ORGANIZATION/settings/audit-trail ```
Prevention
- Never make manual infrastructure changes in production
- Use
terraform importfor existing resources before managing - Enable state locking with DynamoDB (S3) or built-in (Azure, GCS)
- Configure CI/CD with concurrency limits for Terraform jobs
- Enable bucket versioning for state file rollback
- Run
terraform planin PR review before any apply - Schedule regular drift detection scans
- Use
prevent_destroylifecycle rule for critical resources
```hcl # Protect critical resources from accidental destruction resource "aws_rds_cluster" "main" { # ... configuration ...
lifecycle { prevent_destroy = true } } ```
Related Errors
- **Error acquiring the state lock**: Another Terraform process holds lock
- **Backend configuration changed**: Remote backend reconfigured
- **Resource no longer exists**: Infrastructure deleted outside Terraform