The remote-exec provisioner runs scripts on remote resources after creation. Connection and execution errors are common due to SSH configuration, timing, and script issues.
Understanding Remote-Exec Errors
Common remote-exec errors:
``
Error: Failed to connect: timeout
Error: SSH authentication failed: permission denied
Error: Script execution failed: exit status 127
Error: Connection refused: cannot reach host
Issue 1: SSH Connection Timeout
Instance not ready for SSH connections when provisioner runs.
Error Example:
``
Error: Failed to connect to remote host
Error: timeout after 5 minutes waiting for SSH
Root Cause: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro"
provisioner "remote-exec" { inline = ["apt-get update"]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip timeout = "5m" # May be too short for cloud-init } } } ```
Solution:
Wait for instance readiness before provisioning: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro"
# First provisioner waits for cloud-init provisioner "remote-exec" { inline = [ "while ! test -f /var/lib/cloud/instance/boot-finished; do sleep 2; done" ]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip timeout = "15m" } }
# Main provisioner runs after boot complete provisioner "remote-exec" { inline = ["sudo apt-get update -y"]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } } } ```
Or use null_resource for better control: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro" }
resource "null_resource" "wait_for_ssh" { depends_on = [aws_instance.web]
provisioner "local-exec" { command = <<-EOT for i in {1..30}; do if ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 \ -i ~/.ssh/id_rsa ubuntu@${aws_instance.web.public_ip} \ "echo connected"; then break fi sleep 10 done EOT } }
resource "null_resource" "configure" { depends_on = [null_resource.wait_for_ssh]
provisioner "remote-exec" { inline = ["sudo apt-get update -y"]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = aws_instance.web.public_ip } } } ```
Issue 2: SSH Authentication Failure
Wrong user, key, or authentication method.
Error Example:
``
Error: SSH authentication failed
Permission denied (publickey)
Solution:
Match user to AMI type: ```hcl # EC2 AMI user mapping # Amazon Linux 2: ec2-user # Ubuntu: ubuntu # Debian: admin or debian # RHEL: ec2-user or root # CentOS: centos # Fedora: fedora # SUSE: ec2-user
resource "aws_instance" "web" { ami = var.ubuntu_ami # Ubuntu AMI
provisioner "remote-exec" { connection { user = "ubuntu" # Correct for Ubuntu } } }
resource "aws_instance" "amazon_linux" { ami = var.amazon_ami # Amazon Linux AMI
provisioner "remote-exec" { connection { user = "ec2-user" # Correct for Amazon Linux } } } ```
Ensure key matches instance: ```hcl resource "aws_key_pair" "deployer" { key_name = "deployer-key" public_key = file("~/.ssh/id_rsa.pub") }
resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro" key_name = aws_key_pair.deployer.key_name
provisioner "remote-exec" { connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") # Private key matches key_pair host = self.public_ip }
inline = ["echo 'connected successfully'"] } } ```
Issue 3: Script Execution Failures
Commands fail to run on the remote host.
Error Example:
``
Error: Script execution failed
Error: exit status 127: command not found
Solution:
Handle command availability: ```hcl resource "aws_instance" "web" { provisioner "remote-exec" { inline = [ # Install prerequisites first "sudo apt-get update -qq", "sudo apt-get install -y curl jq",
# Then run commands that need them "curl -s https://api.example.com/config | jq -r '.value'" ]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } } } ```
Use inline scripts with proper error handling: ```hcl resource "aws_instance" "web" { provisioner "remote-exec" { inline = [ <<-SCRIPT set -e
# Check for required tools for cmd in curl jq aws; do if ! command -v $cmd &> /dev/null; then echo "ERROR: $cmd not found" exit 1 fi done
# Run main script ./deploy-app.sh --verbose SCRIPT ] } } ```
Issue 4: File Upload Issues
Cannot upload files via remote-exec.
Error Example:
``
Error: Failed to upload file: permission denied
Solution:
Use file provisioner: ```hcl resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro"
# Upload files first provisioner "file" { source = "config/app.conf" destination = "/tmp/app.conf"
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } }
provisioner "file" { source = "scripts/" # Directory destination = "/tmp/scripts/"
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } }
# Then run scripts provisioner "remote-exec" { inline = [ "sudo mkdir -p /etc/app", "sudo mv /tmp/app.conf /etc/app/", "chmod +x /tmp/scripts/*.sh", "/tmp/scripts/deploy.sh" ]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } } } ```
Issue 5: Private IP Connectivity
Cannot reach instances in private subnets.
Error Example:
``
Error: Failed to connect to 10.0.2.50: network unreachable
Solution:
Use bastion host for private instances: ```hcl resource "aws_instance" "bastion" { ami = var.ami instance_type = "t3.micro" subnet_id = var.public_subnet_id key_name = var.key_name
associate_public_ip_address = true }
resource "aws_instance" "private" { ami = var.ami instance_type = "t3.micro" subnet_id = var.private_subnet_id key_name = var.key_name
provisioner "remote-exec" { inline = ["sudo apt-get update"]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.private_ip
# Bastion configuration bastion_host = aws_instance.bastion.public_ip bastion_user = "ubuntu" bastion_private_key = file("~/.ssh/id_rsa") } } } ```
Issue 6: Windows WinRM Issues
WinRM connections fail for Windows instances.
Error Example:
``
Error: WinRM connection failed: endpoint unreachable
Solution:
Configure WinRM via user data: ```hcl resource "aws_instance" "windows" { ami = var.windows_ami instance_type = "t3.medium"
user_data = <<-EOF <powershell> Enable-PSRemoting -Force Set-Item WSMan:\localhost\Client\TrustedHosts -Value '*' -Force
# Configure basic auth winrm set winrm/config/service/auth '@{Basic="true"}' winrm set winrm/config/service '@{AllowUnencrypted="true"}'
# For HTTPS (recommended in production) $cert = New-SelfSignedCertificate -DnsName $env:COMPUTERNAME winrm create winrm/config/listener?Address=*+Transport=HTTPS @{CertificateThumbprint=$cert.Thumbprint} </powershell> EOF
provisioner "remote-exec" { inline = [ "powershell -Command \"Install-WindowsFeature -Name Web-Server\"", "powershell -Command \"New-Item -Path C:\\inetpub\\wwwroot\\index.html -ItemType File -Value 'Hello Terraform'\"" ]
connection { type = "winrm" user = "Administrator" password = var.admin_password host = self.public_ip port = 5985 timeout = "10m"
# For HTTPS # use_https = true # port = 5986 # insecure = true } } } ```
Issue 7: Destroy Provisioner Failures
Provisioner fails when instance is being destroyed.
Error Example:
``
Error: Connection refused during destroy provisioner
Instance may already be terminated
Solution:
Handle destroy gracefully: ```hcl resource "null_resource" "cleanup" { provisioner "remote-exec" { when = destroy
inline = [ "rm -rf /tmp/application-data", "docker stop $(docker ps -q)", "docker system prune -f" ]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = aws_instance.web.public_ip
# Don't fail Terraform if cleanup fails on_failure = continue } } } ```
Issue 8: Security Group Blocking SSH
SSH port not open in security group.
Error Example:
``
Error: Connection refused: SSH port 22 blocked
Solution:
Add SSH ingress rule: ```hcl resource "aws_security_group" "allow_ssh" { name = "allow_ssh" description = "Allow SSH inbound traffic" vpc_id = var.vpc_id
ingress { description = "SSH from Terraform" from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = [var.allowed_ssh_cidr] # Restrict to your IP }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }
resource "aws_instance" "web" { ami = var.ami instance_type = "t3.micro" vpc_security_group_ids = [aws_security_group.allow_ssh.id]
provisioner "remote-exec" { inline = ["echo 'connected'"]
connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/id_rsa") host = self.public_ip } } } ```
Verification Steps
Test SSH connectivity manually: ```bash # Test SSH directly ssh -vvv -i ~/.ssh/id_rsa ubuntu@INSTANCE_IP
# Test with timeout timeout 30 ssh -i ~/.ssh/id_rsa ubuntu@INSTANCE_IP "echo connected"
# Check security groups aws ec2 describe-instance-attribute \ --instance-id i-abc123 \ --attribute groupSet ```
Debug provisioner execution:
``bash
export TF_LOG=DEBUG
terraform apply 2>&1 | grep -i ssh
Prevention Best Practices
- 1.Use cloud-init/user_data instead of remote-exec when possible
- 2.Wait for cloud-init completion before provisioning
- 3.Use correct SSH user for your AMI type
- 4.Configure security groups before provisioning
- 5.Use bastion hosts for private subnet instances
- 6.Add on_failure = continue for destroy provisioners
- 7.Test SSH connectivity manually before running Terraform
- 8.Consider using configuration management tools instead of provisioners