Provisioners in Terraform execute scripts on local or remote resources. Connection errors prevent provisioners from reaching their targets, blocking resource creation.

Understanding Provisioner Connection Errors

Connection errors appear as: `` Error: Failed to connect to remote host: timeout Error: SSH authentication failed: permission denied Error: WinRM connection error: endpoint not reachable Error: connection refused: unable to connect

Issue 1: SSH Connection Timeout

SSH provisioners cannot reach the target host within timeout limits.

Error Example: ``` Error: Failed to connect to remote host

timeout after 5 minutes ```

Root Cause: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro"

provisioner "remote-exec" { inline = ["sudo apt-get update"]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip timeout = "5m" # Default might be too short for slow instances } } } ```

Solution:

Increase timeout and add retry logic: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro"

# Wait for instance to be ready provisioner "remote-exec" { inline = [ "while ! test -f /var/lib/cloud/instance/boot-finished; do sleep 1; done" ]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip timeout = "10m" # Increase timeout

# Retry settings script_path = "/tmp/terraform-provisioner" } }

# Main provisioner runs after boot provisioner "remote-exec" { inline = ["sudo apt-get update -y"]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip timeout = "10m" } } } ```

Use null_resource for better control: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro" }

resource "null_resource" "configure_web" { depends_on = [aws_instance.web]

provisioner "remote-exec" { inline = ["sudo apt-get update -y"]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = aws_instance.web.public_ip timeout = "15m" } }

# Trigger on instance changes triggers = { instance_id = aws_instance.web.id } } ```

Issue 2: SSH Authentication Failure

SSH login fails due to incorrect credentials or key issues.

Error Example: `` Error: SSH authentication failed Permission denied (publickey)

Solution:

Verify SSH credentials: ```bash # Test SSH manually ssh -i ~/.ssh/id_rsa ubuntu@INSTANCE_IP

# Check key permissions chmod 600 ~/.ssh/id_rsa

# Verify key matches instance aws ec2 get-console-output --instance-id i-abc123 | grep ssh ```

Configure correct authentication: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro"

# Inject correct SSH key key_name = aws_key_pair.main.key_name

provisioner "remote-exec" { inline = ["sudo apt-get update"]

connection { type = "ssh" user = "ec2-user" # Use correct user for AMI private_key = tls_private_key.main.private_key_pem host = self.public_ip } } }

# Create key pair properly resource "tls_private_key" "main" { algorithm = "RSA" rsa_bits = 4096 }

resource "aws_key_pair" "main" { key_name = "terraform-key" public_key = tls_private_key.main.public_key_openssh }

resource "local_file" "private_key" { content = tls_private_key.main.private_key_pem filename = ".ssh/terraform-key" file_permission = "0600" } ```

Issue 3: WinRM Connection Errors

Windows instances require WinRM configuration for provisioners.

Error Example: `` Error: WinRM connection error endpoint: http://10.0.1.50:5985/wsman not reachable

Solution:

Ensure WinRM is enabled and configured: ```hcl resource "aws_instance" "windows" { ami = var.windows_ami instance_type = "t3.medium"

# User data to configure WinRM user_data = <<-EOF <powershell> Enable-PSRemoting -Force Set-Item WSMan:\localhost\Client\TrustedHosts -Value '*' -Force winrm set winrm/config/service/auth '@{Basic="true"}' winrm set winrm/config/service '@{AllowUnencrypted="true"}' </powershell> EOF

provisioner "remote-exec" { inline = ["powershell -Command \"Get-Service\""]

connection { type = "winrm" user = "Administrator" password = var.admin_password host = self.public_ip port = 5985 timeout = "10m"

# For HTTPS (recommended) # use_https = true # port = 5986 # insecure = true # For self-signed certs } } }

# Use AWS Secrets Manager for password resource "aws_secretsmanager_secret_version" "admin_password" { secret_id = aws_secretsmanager_secret.admin.id secret_string = random_password.admin.result } ```

Issue 4: Connection Through Bastion Host

Provisioners need to connect through a bastion host.

Error Example: `` Error: Failed to connect to remote host Cannot reach private instance 10.0.2.50 directly

Solution:

Configure bastion host connection: ```hcl resource "aws_instance" "bastion" { ami = var.ami instance_type = "t3.micro" subnet_id = var.public_subnet_id

key_name = var.key_name }

resource "aws_instance" "private_instance" { ami = var.ami instance_type = "t3.micro" subnet_id = var.private_subnet_id

key_name = var.key_name

provisioner "remote-exec" { inline = ["sudo apt-get update"]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.private_ip

# Bastion configuration bastion_host = aws_instance.bastion.public_ip bastion_user = "ubuntu" bastion_private_key = file("~/.ssh/id_rsa") } } } ```

Issue 5: Self-Reference Connection Issues

Using self attributes before resource is created.

Error Example: `` Error: self.public_ip cannot be used during resource creation

Solution:

Use null_resource with explicit dependency: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro" }

# Provisioning after instance creation resource "null_resource" "web_provisioner" { depends_on = [aws_instance.web]

provisioner "remote-exec" { inline = ["sudo apt-get update"]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = aws_instance.web.public_ip } } } ```

Issue 6: Connection During Destroy

Provisioners fail during destroy when resources are gone.

Error Example: `` Error: Connection refused during destroy provisioner

Solution:

Handle destroy provisioners carefully: ```hcl resource "null_resource" "cleanup" { provisioner "local-exec" { when = destroy command = "echo 'Cleanup complete'" }

# Remote destroy requires connection still active provisioner "remote-exec" { when = destroy

inline = ["rm -rf /tmp/app-data"]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = aws_instance.web.public_ip

# May fail if instance is already destroyed # Use on_failure = continue for resilience on_failure = continue } } } ```

Issue 7: Network Security Groups Blocking Connections

Security groups don't allow provisioner connections.

Error Example: `` Error: Connection timed out Security group rules may be blocking access

Solution:

Add security group rules for provisioner access: ```hcl resource "aws_security_group" "allow_ssh" { name_prefix = "allow-ssh-" vpc_id = var.vpc_id

ingress { from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] # Or restrict to specific IPs description = "SSH for Terraform provisioner" }

ingress { from_port = 5985 to_port = 5985 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] # WinRM HTTP }

ingress { from_port = 5986 to_port = 5986 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] # WinRM HTTPS }

egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }

resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro" vpc_security_group_ids = [aws_security_group.allow_ssh.id]

provisioner "remote-exec" { inline = ["echo 'connected'"]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } } } ```

Verification Steps

Test connection before provisioning: ```bash # Test SSH connectivity ssh -v -i ~/.ssh/id_rsa ubuntu@INSTANCE_IP

# Test WinRM connectivity ruby -e "require 'winrm'; conn = WinRM::Connection.new('http://INSTANCE_IP:5985/wsman', :user => 'Admin', :pass => 'password'); conn.shell(:powershell) {|s| s.run('hostname')}" ```

Verify security group rules: ``bash aws ec2 describe-security-groups --group-ids sg-abc123 | jq '.SecurityGroups[].IpPermissions'

Prevention Best Practices

  1. 1.Avoid provisioners when possible - use cloud-init, user data, or configuration management tools
  2. 2.Use longer timeouts for slow-booting instances
  3. 3.Configure security groups before provisioning
  4. 4.Store SSH keys securely and use proper file permissions
  5. 5.Use null_resource to separate provisioning from resource creation
  6. 6.Test connections manually before running Terraform
  7. 7.Handle destroy provisioners with on_failure = continue