The remote-exec provisioner runs scripts on remote resources after creation. Connection and execution errors are common due to SSH configuration, timing, and script issues.

Understanding Remote-Exec Errors

Common remote-exec errors: `` Error: Failed to connect: timeout Error: SSH authentication failed: permission denied Error: Script execution failed: exit status 127 Error: Connection refused: cannot reach host

Issue 1: SSH Connection Timeout

Instance not ready for SSH connections when provisioner runs.

Error Example: `` Error: Failed to connect to remote host Error: timeout after 5 minutes waiting for SSH

Root Cause: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro"

provisioner "remote-exec" { inline = ["apt-get update"]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip timeout = "5m" # May be too short for cloud-init } } } ```

Solution:

Wait for instance readiness before provisioning: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro"

# First provisioner waits for cloud-init provisioner "remote-exec" { inline = [ "while ! test -f /var/lib/cloud/instance/boot-finished; do sleep 2; done" ]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip timeout = "15m" } }

# Main provisioner runs after boot complete provisioner "remote-exec" { inline = ["sudo apt-get update -y"]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } } } ```

Or use null_resource for better control: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro" }

resource "null_resource" "wait_for_ssh" { depends_on = [aws_instance.web]

provisioner "local-exec" { command = <<-EOT for i in {1..30}; do if ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 \ -i ~/.ssh/id_rsa ubuntu@${aws_instance.web.public_ip} \ "echo connected"; then break fi sleep 10 done EOT } }

resource "null_resource" "configure" { depends_on = [null_resource.wait_for_ssh]

provisioner "remote-exec" { inline = ["sudo apt-get update -y"]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = aws_instance.web.public_ip } } } ```

Issue 2: SSH Authentication Failure

Wrong user, key, or authentication method.

Error Example: `` Error: SSH authentication failed Permission denied (publickey)

Solution:

Match user to AMI type: ```hcl # EC2 AMI user mapping # Amazon Linux 2: ec2-user # Ubuntu: ubuntu # Debian: admin or debian # RHEL: ec2-user or root # CentOS: centos # Fedora: fedora # SUSE: ec2-user

resource "aws_instance" "web" { ami = var.ubuntu_ami # Ubuntu AMI

provisioner "remote-exec" { connection { user = "ubuntu" # Correct for Ubuntu } } }

resource "aws_instance" "amazon_linux" { ami = var.amazon_ami # Amazon Linux AMI

provisioner "remote-exec" { connection { user = "ec2-user" # Correct for Amazon Linux } } } ```

Ensure key matches instance: ```hcl resource "aws_key_pair" "deployer" { key_name = "deployer-key" public_key = file("~/.ssh/id_rsa.pub") }

resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro" key_name = aws_key_pair.deployer.key_name

provisioner "remote-exec" { connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") # Private key matches key_pair host = self.public_ip }

inline = ["echo 'connected successfully'"] } } ```

Issue 3: Script Execution Failures

Commands fail to run on the remote host.

Error Example: `` Error: Script execution failed Error: exit status 127: command not found

Solution:

Handle command availability: ```hcl resource "aws_instance" "web" { provisioner "remote-exec" { inline = [ # Install prerequisites first "sudo apt-get update -qq", "sudo apt-get install -y curl jq",

# Then run commands that need them "curl -s https://api.example.com/config | jq -r '.value'" ]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } } } ```

Use inline scripts with proper error handling: ```hcl resource "aws_instance" "web" { provisioner "remote-exec" { inline = [ <<-SCRIPT set -e

# Check for required tools for cmd in curl jq aws; do if ! command -v $cmd &> /dev/null; then echo "ERROR: $cmd not found" exit 1 fi done

# Run main script ./deploy-app.sh --verbose SCRIPT ] } } ```

Issue 4: File Upload Issues

Cannot upload files via remote-exec.

Error Example: `` Error: Failed to upload file: permission denied

Solution:

Use file provisioner: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro"

# Upload files first provisioner "file" { source = "config/app.conf" destination = "/tmp/app.conf"

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } }

provisioner "file" { source = "scripts/" # Directory destination = "/tmp/scripts/"

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } }

# Then run scripts provisioner "remote-exec" { inline = [ "sudo mkdir -p /etc/app", "sudo mv /tmp/app.conf /etc/app/", "chmod +x /tmp/scripts/*.sh", "/tmp/scripts/deploy.sh" ]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } } } ```

Issue 5: Private IP Connectivity

Cannot reach instances in private subnets.

Error Example: `` Error: Failed to connect to 10.0.2.50: network unreachable

Solution:

Use bastion host for private instances: ```hcl resource "aws_instance" "bastion" { ami = var.ami instance_type = "t3.micro" subnet_id = var.public_subnet_id key_name = var.key_name

associate_public_ip_address = true }

resource "aws_instance" "private" { ami = var.ami instance_type = "t3.micro" subnet_id = var.private_subnet_id key_name = var.key_name

provisioner "remote-exec" { inline = ["sudo apt-get update"]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.private_ip

# Bastion configuration bastion_host = aws_instance.bastion.public_ip bastion_user = "ubuntu" bastion_private_key = file("~/.ssh/id_rsa") } } } ```

Issue 6: Windows WinRM Issues

WinRM connections fail for Windows instances.

Error Example: `` Error: WinRM connection failed: endpoint unreachable

Solution:

Configure WinRM via user data: ```hcl resource "aws_instance" "windows" { ami = var.windows_ami instance_type = "t3.medium"

user_data = <<-EOF <powershell> Enable-PSRemoting -Force Set-Item WSMan:\localhost\Client\TrustedHosts -Value '*' -Force

# Configure basic auth winrm set winrm/config/service/auth '@{Basic="true"}' winrm set winrm/config/service '@{AllowUnencrypted="true"}'

# For HTTPS (recommended in production) $cert = New-SelfSignedCertificate -DnsName $env:COMPUTERNAME winrm create winrm/config/listener?Address=*+Transport=HTTPS @{CertificateThumbprint=$cert.Thumbprint} </powershell> EOF

provisioner "remote-exec" { inline = [ "powershell -Command \"Install-WindowsFeature -Name Web-Server\"", "powershell -Command \"New-Item -Path C:\\inetpub\\wwwroot\\index.html -ItemType File -Value 'Hello Terraform'\"" ]

connection { type = "winrm" user = "Administrator" password = var.admin_password host = self.public_ip port = 5985 timeout = "10m"

# For HTTPS # use_https = true # port = 5986 # insecure = true } } } ```

Issue 7: Destroy Provisioner Failures

Provisioner fails when instance is being destroyed.

Error Example: `` Error: Connection refused during destroy provisioner Instance may already be terminated

Solution:

Handle destroy gracefully: ```hcl resource "null_resource" "cleanup" { provisioner "remote-exec" { when = destroy

inline = [ "rm -rf /tmp/application-data", "docker stop $(docker ps -q)", "docker system prune -f" ]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = aws_instance.web.public_ip

# Don't fail Terraform if cleanup fails on_failure = continue } } } ```

Issue 8: Security Group Blocking SSH

SSH port not open in security group.

Error Example: `` Error: Connection refused: SSH port 22 blocked

Solution:

Add SSH ingress rule: ```hcl resource "aws_security_group" "allow_ssh" { name = "allow_ssh" description = "Allow SSH inbound traffic" vpc_id = var.vpc_id

ingress { description = "SSH from Terraform" from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = [var.allowed_ssh_cidr] # Restrict to your IP }

egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }

resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro" vpc_security_group_ids = [aws_security_group.allow_ssh.id]

provisioner "remote-exec" { inline = ["echo 'connected'"]

connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } } } ```

Verification Steps

Test SSH connectivity manually: ```bash # Test SSH directly ssh -vvv -i ~/.ssh/id_rsa ubuntu@INSTANCE_IP

# Test with timeout timeout 30 ssh -i ~/.ssh/id_rsa ubuntu@INSTANCE_IP "echo connected"

# Check security groups aws ec2 describe-instance-attribute \ --instance-id i-abc123 \ --attribute groupSet ```

Debug provisioner execution: ``bash export TF_LOG=DEBUG terraform apply 2>&1 | grep -i ssh

Prevention Best Practices

  1. 1.Use cloud-init/user_data instead of remote-exec when possible
  2. 2.Wait for cloud-init completion before provisioning
  3. 3.Use correct SSH user for your AMI type
  4. 4.Configure security groups before provisioning
  5. 5.Use bastion hosts for private subnet instances
  6. 6.Add on_failure = continue for destroy provisioners
  7. 7.Test SSH connectivity manually before running Terraform
  8. 8.Consider using configuration management tools instead of provisioners