How to Fix Terraform Apply Timeout and Hanging Operations

What's Actually Happening

You run terraform apply and it starts creating or updating resources, but then it hangs. The operation doesn't progress, doesn't fail with a clear error, and eventually times out. This leaves your infrastructure in an incomplete state and blocks further operations.

The Error You'll See

``` Error: waiting for EC2 Instance (i-0123456789abcdef0) creation: timeout while waiting for state to become 'running' (last state: 'pending', timeout: 10m0s)

Error: context deadline exceeded

Error: waiting for RDS Cluster (my-cluster) to become available: timeout while waiting for state to become 'available' (timeout: 30m0s)

Error: timeout - last error: dial tcp 10.0.0.1:22: i/o timeout ```

Or the eternal spinner:

bash

module.eks.aws_eks_cluster.main: Still creating... [10m30s elapsed]
module.eks.aws_eks_cluster.main: Still creating... [11m0s elapsed]
module.eks.aws_eks_cluster.main: Still creating... [12m0s elapsed]
... (continues without end)

Why This Happens

Timeouts occur due to:

1.Slow resource provisioning - Large databases, EKS clusters genuinely take 30+ minutes
2.Default timeouts too short - Terraform's 10-minute default is insufficient for many resources
3.Network connectivity issues - Terraform cannot reach the resource for status polling
4.Provider polling bugs - Incorrect status detection in provider code
5.API rate limiting - Cloud provider throttling API calls
6.Resource quotas exceeded - Hit service limits preventing completion
7.Dependency bottlenecks - Waiting on slow prerequisite resources
8.Provisioner failures - Remote-exec or local-exec provisioners timing out

Step 1: Increase Resource Timeout Values

Configure explicit timeouts for slow resources:

```hcl # EC2 instances (default 10m may be too short) resource "aws_instance" "large" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.micro"

timeouts { create = "30m" # Increase from default 10m update = "30m" delete = "30m" } }

# RDS databases (can take 1-2 hours for Multi-AZ) resource "aws_db_instance" "production" { allocated_storage = 100 engine = "postgres" engine_version = "15.0" instance_class = "db.r5.large"

timeouts { create = "3h" # Multi-AZ creation is very slow update = "3h" delete = "2h" } }

# EKS clusters (20-40 minutes typical) resource "aws_eks_cluster" "main" { name = "my-cluster" role_arn = aws_iam_role.eks.arn

timeouts { create = "45m" update = "60m" delete = "45m" } }

# EKS node groups resource "aws_eks_node_group" "main" { cluster_name = aws_eks_cluster.main.name node_group_name = "main"

timeouts { create = "30m" update = "60m" delete = "30m" } }

# CloudFront distributions (15-30 minutes) resource "aws_cloudfront_distribution" "cdn" { # ... distribution configuration ...

timeouts { create = "1h" update = "1h" } }

# Lambda with VPC attachment (can be slow) resource "aws_lambda_function" "vpc_lambda" { function_name = "vpc-function" vpc_config { subnet_ids = aws_subnet.private[*].id security_group_ids = [aws_security_group.lambda.id] }

timeouts { create = "15m" update = "15m" } } ```

Step 2: Check Resource Status During Timeout

Verify what's happening with the actual resource:

```bash # For EC2 instances aws ec2 describe-instances \ --instance-ids i-0123456789abcdef0 \ --query 'Reservations[].Instances[].{State:State.Name,LaunchTime:LaunchTime}'

# For RDS databases aws rds describe-db-instances \ --db-instance-identifier my-db \ --query 'DBInstances[].{Status:DBInstanceStatus,Progress:PercentProgress}'

# For EKS clusters aws eks describe-cluster \ --name my-cluster \ --query 'cluster.{Status:status,Endpoint:endpoint}'

# For CloudFront aws cloudfront get-distribution \ --id E1234567890ABC \ --query 'Distribution.{Status:Status,InProgressInvalidations:InProgressInvalidationBatches}' ```

If the resource is actually ready but Terraform timed out:

bash

# Import the completed resource
terraform import aws_db_instance.production my-db-instance

Step 3: Handle Provisioner Timeouts

For SSH-based provisioners that timeout:

```hcl resource "aws_instance" "with_provisioner" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.micro"

provisioner "remote-exec" { inline = [ "sudo yum update -y", "sudo yum install -y nginx", ]

connection { type = "ssh" user = "ec2-user" host = self.public_ip private_key = file(var.private_key_path) timeout = "10m" # Connection timeout } }

timeouts { create = "30m" # Total resource timeout including provisioner } } ```

Step 4: Check Network Connectivity

Verify Terraform can reach resources for status polling:

```bash # For resources requiring SSH access ssh -v -i my-key.pem ec2-user@10.0.0.1

# Check security groups allow required ports aws ec2 describe-security-groups \ --group-ids sg-12345678 \ --query 'SecurityGroups[].IpPermissions[]'

# Verify NAT Gateway is functioning for private resources aws ec2 describe-nat-gateways --nat-gateway-ids nat-12345678

# Check VPN/Direct Connect status if applicable aws ec2 describe-vpn-connections --vpn-connection-ids vpn-12345678 ```

Step 5: Reduce API Rate Limiting

When timeouts are caused by provider API throttling:

```bash # Reduce parallel operations terraform apply -parallelism=5 # Default is 10

# Or even lower for rate-sensitive operations terraform apply -parallelism=2 ```

Configure provider retry behavior:

```hcl provider "aws" { region = "us-east-1"

# Increase retry attempts for rate limiting max_retries = 25

# Or use custom retry mode retry_mode = "adaptive" } ```

Step 6: Break Large Operations into Stages

Apply resources incrementally to avoid timeout accumulation:

```bash # Stage 1: Networking terraform apply -target=module.networking

# Stage 2: Compute terraform apply -target=module.compute

# Stage 3: Database terraform apply -target=aws_db_instance.production

# Stage 4: Full apply for remaining resources terraform apply ```

Or use -target for problematic resources:

bash

# Apply just the slow resource with more time
terraform apply -target=aws_eks_cluster.main
# Wait, then apply rest
terraform apply

Step 7: Recover from Timeout State

When apply times out leaving partial infrastructure:

```bash # Check current state terraform state list

# Find incomplete resources terraform state show aws_instance.problematic

# If resource exists but state is wrong terraform taint aws_instance.problematic # Force recreation

# If resource doesn't exist but state thinks it does terraform state rm aws_instance.problematic

# Re-import if resource was actually created terraform import aws_instance.problematic i-0123456789abcdef0 ```

Step 8: Enable Debugging for Timeout Analysis

Get detailed information on what's timing out:

```bash # Enable Terraform debug logs export TF_LOG=DEBUG export TF_LOG_PATH=./terraform-debug.log terraform apply

# Analyze timeout patterns grep -i "timeout|deadline|exceeded|waiting" terraform-debug.log

# Find which API calls are slow grep -i "polling|status|describe" terraform-debug.log ```

Step 9: Handle Specific Resource Timeouts

RDS Multi-AZ Creation: ```hcl resource "aws_db_instance" "multi_az" { multi_az = true # Makes creation much slower allocated_storage = 100 engine = "mysql"

timeouts { create = "4h" # Multi-AZ with large storage takes hours update = "4h" delete = "2h" }

skip_final_snapshot = true # Faster deletion } ```

EKS Cluster and Node Groups: ```hcl resource "aws_eks_cluster" "main" { name = "production" timeouts { create = "45m"; delete = "45m" } }

# Create node group separately after cluster resource "aws_eks_node_group" "workers" { depends_on = [aws_eks_cluster.main] timeouts { create = "30m"; update = "60m" } } ```

S3 Bucket with Many Objects: ``bash # For buckets with thousands of objects, deletion can timeout terraform apply -target=aws_s3_bucket.main # Then manually empty before destroy aws s3 rm s3://my-bucket --recursive terraform destroy -target=aws_s3_bucket.main

Verify the Fix

After adding timeout configuration:

```bash # Validate configuration terraform validate

# Run plan to check timeouts are recognized terraform plan

# Apply with increased timeouts terraform apply ```

Verify resources complete within configured time:

```bash # Monitor during creation watch -n 30 'aws eks describe-cluster --name my-cluster --query cluster.status'

# After successful apply terraform state list ```

Prevention Best Practices

Pre-configure timeouts in your resource templates:

```hcl # Standard timeout template for all large resources resource "aws_db_instance" "template" { timeouts { create = "2h" update = "2h" delete = "1h" } }

resource "aws_eks_cluster" "template" { timeouts { create = "45m" update = "60m" delete = "45m" } } ```

Document expected creation times:

markdown

## Resource Creation Times
- RDS Single-AZ: 15-30 minutes
- RDS Multi-AZ: 45-120 minutes
- EKS Cluster: 20-40 minutes
- EKS Node Group: 10-20 minutes per group
- CloudFront Distribution: 15-30 minutes

Fix Terraform Apply Timeout - Resource Creation Hanging Indefinitely

What's Actually Happening

The Error You'll See

Why This Happens

Step 1: Increase Resource Timeout Values

Step 2: Check Resource Status During Timeout

Step 3: Handle Provisioner Timeouts

Step 4: Check Network Connectivity

Step 5: Reduce API Rate Limiting

Step 6: Break Large Operations into Stages

Step 7: Recover from Timeout State

Step 8: Enable Debugging for Timeout Analysis

Step 9: Handle Specific Resource Timeouts

Verify the Fix

Prevention Best Practices

Share this guide

More Terraform Troubleshooting Guides

Configure Terraform Moved Block

Configure Terraform Lifecycle Rules

Configure Terraform Depends On

Configure Terraform Multiple Providers

Configure Terraform Variable Sets

Configure Terraform API Token