Provisioners in Terraform execute scripts on local or remote resources. Connection errors prevent provisioners from reaching their targets, blocking resource creation.
Understanding Provisioner Connection Errors
Connection errors appear as:
``
Error: Failed to connect to remote host: timeout
Error: SSH authentication failed: permission denied
Error: WinRM connection error: endpoint not reachable
Error: connection refused: unable to connect
Issue 1: SSH Connection Timeout
SSH provisioners cannot reach the target host within timeout limits.
Error Example: ``` Error: Failed to connect to remote host
timeout after 5 minutes ```
Root Cause: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro"
provisioner "remote-exec" { inline = ["sudo apt-get update"]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip timeout = "5m" # Default might be too short for slow instances } } } ```
Solution:
Increase timeout and add retry logic: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro"
# Wait for instance to be ready provisioner "remote-exec" { inline = [ "while ! test -f /var/lib/cloud/instance/boot-finished; do sleep 1; done" ]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip timeout = "10m" # Increase timeout
# Retry settings script_path = "/tmp/terraform-provisioner" } }
# Main provisioner runs after boot provisioner "remote-exec" { inline = ["sudo apt-get update -y"]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip timeout = "10m" } } } ```
Use null_resource for better control: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro" }
resource "null_resource" "configure_web" { depends_on = [aws_instance.web]
provisioner "remote-exec" { inline = ["sudo apt-get update -y"]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = aws_instance.web.public_ip timeout = "15m" } }
# Trigger on instance changes triggers = { instance_id = aws_instance.web.id } } ```
Issue 2: SSH Authentication Failure
SSH login fails due to incorrect credentials or key issues.
Error Example:
``
Error: SSH authentication failed
Permission denied (publickey)
Solution:
Verify SSH credentials: ```bash # Test SSH manually ssh -i ~/.ssh/id_rsa ubuntu@INSTANCE_IP
# Check key permissions chmod 600 ~/.ssh/id_rsa
# Verify key matches instance aws ec2 get-console-output --instance-id i-abc123 | grep ssh ```
Configure correct authentication: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro"
# Inject correct SSH key key_name = aws_key_pair.main.key_name
provisioner "remote-exec" { inline = ["sudo apt-get update"]
connection { type = "ssh" user = "ec2-user" # Use correct user for AMI private_key = tls_private_key.main.private_key_pem host = self.public_ip } } }
# Create key pair properly resource "tls_private_key" "main" { algorithm = "RSA" rsa_bits = 4096 }
resource "aws_key_pair" "main" { key_name = "terraform-key" public_key = tls_private_key.main.public_key_openssh }
resource "local_file" "private_key" { content = tls_private_key.main.private_key_pem filename = ".ssh/terraform-key" file_permission = "0600" } ```
Issue 3: WinRM Connection Errors
Windows instances require WinRM configuration for provisioners.
Error Example:
``
Error: WinRM connection error
endpoint: http://10.0.1.50:5985/wsman not reachable
Solution:
Ensure WinRM is enabled and configured: ```hcl resource "aws_instance" "windows" { ami = var.windows_ami instance_type = "t3.medium"
# User data to configure WinRM user_data = <<-EOF <powershell> Enable-PSRemoting -Force Set-Item WSMan:\localhost\Client\TrustedHosts -Value '*' -Force winrm set winrm/config/service/auth '@{Basic="true"}' winrm set winrm/config/service '@{AllowUnencrypted="true"}' </powershell> EOF
provisioner "remote-exec" { inline = ["powershell -Command \"Get-Service\""]
connection { type = "winrm" user = "Administrator" password = var.admin_password host = self.public_ip port = 5985 timeout = "10m"
# For HTTPS (recommended) # use_https = true # port = 5986 # insecure = true # For self-signed certs } } }
# Use AWS Secrets Manager for password resource "aws_secretsmanager_secret_version" "admin_password" { secret_id = aws_secretsmanager_secret.admin.id secret_string = random_password.admin.result } ```
Issue 4: Connection Through Bastion Host
Provisioners need to connect through a bastion host.
Error Example:
``
Error: Failed to connect to remote host
Cannot reach private instance 10.0.2.50 directly
Solution:
Configure bastion host connection: ```hcl resource "aws_instance" "bastion" { ami = var.ami instance_type = "t3.micro" subnet_id = var.public_subnet_id
key_name = var.key_name }
resource "aws_instance" "private_instance" { ami = var.ami instance_type = "t3.micro" subnet_id = var.private_subnet_id
key_name = var.key_name
provisioner "remote-exec" { inline = ["sudo apt-get update"]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.private_ip
# Bastion configuration bastion_host = aws_instance.bastion.public_ip bastion_user = "ubuntu" bastion_private_key = file("~/.ssh/id_rsa") } } } ```
Issue 5: Self-Reference Connection Issues
Using self attributes before resource is created.
Error Example:
``
Error: self.public_ip cannot be used during resource creation
Solution:
Use null_resource with explicit dependency: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro" }
# Provisioning after instance creation resource "null_resource" "web_provisioner" { depends_on = [aws_instance.web]
provisioner "remote-exec" { inline = ["sudo apt-get update"]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = aws_instance.web.public_ip } } } ```
Issue 6: Connection During Destroy
Provisioners fail during destroy when resources are gone.
Error Example:
``
Error: Connection refused during destroy provisioner
Solution:
Handle destroy provisioners carefully: ```hcl resource "null_resource" "cleanup" { provisioner "local-exec" { when = destroy command = "echo 'Cleanup complete'" }
# Remote destroy requires connection still active provisioner "remote-exec" { when = destroy
inline = ["rm -rf /tmp/app-data"]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = aws_instance.web.public_ip
# May fail if instance is already destroyed # Use on_failure = continue for resilience on_failure = continue } } } ```
Issue 7: Network Security Groups Blocking Connections
Security groups don't allow provisioner connections.
Error Example:
``
Error: Connection timed out
Security group rules may be blocking access
Solution:
Add security group rules for provisioner access: ```hcl resource "aws_security_group" "allow_ssh" { name_prefix = "allow-ssh-" vpc_id = var.vpc_id
ingress { from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] # Or restrict to specific IPs description = "SSH for Terraform provisioner" }
ingress { from_port = 5985 to_port = 5985 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] # WinRM HTTP }
ingress { from_port = 5986 to_port = 5986 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] # WinRM HTTPS }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }
resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro" vpc_security_group_ids = [aws_security_group.allow_ssh.id]
provisioner "remote-exec" { inline = ["echo 'connected'"]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } } } ```
Verification Steps
Test connection before provisioning: ```bash # Test SSH connectivity ssh -v -i ~/.ssh/id_rsa ubuntu@INSTANCE_IP
# Test WinRM connectivity ruby -e "require 'winrm'; conn = WinRM::Connection.new('http://INSTANCE_IP:5985/wsman', :user => 'Admin', :pass => 'password'); conn.shell(:powershell) {|s| s.run('hostname')}" ```
Verify security group rules:
``bash
aws ec2 describe-security-groups --group-ids sg-abc123 | jq '.SecurityGroups[].IpPermissions'
Prevention Best Practices
- 1.Avoid provisioners when possible - use cloud-init, user data, or configuration management tools
- 2.Use longer timeouts for slow-booting instances
- 3.Configure security groups before provisioning
- 4.Store SSH keys securely and use proper file permissions
- 5.Use null_resource to separate provisioning from resource creation
- 6.Test connections manually before running Terraform
- 7.Handle destroy provisioners with on_failure = continue