Your playbook runs fine for several tasks, then suddenly a host becomes unreachable. The initial connection worked, but subsequent tasks fail. This intermittent unreachability is frustrating because it can happen mid-deployment, leaving your systems in an inconsistent state.
The Error
fatal: [webserver]: UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.50 port 22: Connection timed out",
"unreachable": true
}Or during playbook run:
TASK [Deploy application] *******************************************************
fatal: [webserver]: UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: Shared connection to 192.168.1.50 closed.",
"unreachable": true
}Quick Diagnosis
Check the host status immediately:
```bash # Basic connectivity ping -c 3 192.168.1.50
# SSH connection test ssh -o ConnectTimeout=10 user@192.168.1.50 "echo alive"
# Check if SSH is listening nc -zv 192.168.1.50 22 -w 5 ```
For Ansible-specific testing:
```bash # Test with ping module ansible webserver -m ping -u deploy
# Test with wait_for to check port availability ansible webserver -m wait_for -a "host=192.168.1.50 port=22 timeout=30" ```
Common Causes and Fixes
Network Connectivity Issues
The host might have intermittent network problems.
Add retry logic to your playbook:
```yaml - hosts: webservers tasks: - name: Deploy application block: - name: Copy files copy: src: ./app/ dest: /opt/app/ rescue: - name: Wait for host to recover wait_for_connection: delay: 10 timeout: 300
- name: Retry copy
- copy:
- src: ./app/
- dest: /opt/app/
`
**Or use the until loop:**
- name: Ensure service is running
service:
name: myapp
state: started
register: result
until: result is success
retries: 3
delay: 10SSH Connection Drops Mid-Play
Long-running tasks or network issues can drop SSH connections.
Enable SSH keepalives in ansible.cfg:
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ServerAliveInterval=30 -o ServerAliveCountMax=3
pipelining = True**Or in your SSH config (~/.ssh/config):**
Host *
ServerAliveInterval 30
ServerAliveCountMax 3
TCPKeepAlive yesTarget Host Reboot or Restart
If a task triggers a reboot, subsequent tasks fail.
Handle reboots properly:
- name: Reboot server
reboot:
msg: "Rebooting for kernel update"
connect_timeout: 5
reboot_timeout: 300
pre_reboot_delay: 0
post_reboot_delay: 30Or manually with wait_for_connection:
```yaml - name: Restart server shell: sleep 2 && shutdown -r now "Ansible reboot" async: 1 poll: 0 become: yes
- name: Wait for server to restart
- wait_for_connection:
- delay: 30
- timeout: 300
`
The async: 1 and poll: 0 pattern tells Ansible to fire-and-forget the reboot command, preventing it from hanging on the closed connection.
Firewall Changes Blocking SSH
If a task modifies firewall rules and locks itself out.
Use check mode first:
- name: Check firewall rules
iptables:
chain: INPUT
protocol: tcp
destination_port: 22
jump: ACCEPT
check_mode: yes
diff: yesAdd a rescue to restore access:
- name: Configure firewall
block:
- name: Add firewall rule
iptables:
chain: INPUT
protocol: tcp
destination_port: 80
jump: ACCEPT
rescue:
- name: Emergency SSH restore
local_action:
module: shell
cmd: ssh {{ ansible_user }}@{{ ansible_host }} "iptables -A INPUT -p tcp --dport 22 -j ACCEPT"Target Resource Exhaustion
The host runs out of memory or file descriptors, causing SSH to fail.
Check target health first:
```yaml - name: Check available memory shell: free -m | grep Mem | awk '{print $7}' register: free_memory
- name: Fail if low memory
- fail:
- msg: "Less than 100MB free memory"
- when: free_memory.stdout | int < 100
- name: Check disk space
- shell: df / | tail -1 | awk '{print $5}' | tr -d '%'
- register: disk_usage
- name: Fail if disk full
- fail:
- msg: "Disk usage above 90%"
- when: disk_usage.stdout | int > 90
`
SSH Session Limits
Too many parallel connections can overwhelm the target.
Limit parallelism:
```bash # Run with fewer forks ansible-playbook site.yml --forks 5
# Or in ansible.cfg [defaults] forks = 5 ```
Use serial execution for rolling updates:
- hosts: webservers
serial: 2 # Process 2 hosts at a time
tasks:
- name: Deploy app
# ...Target Python Missing or Broken
Ansible requires Python on the target. If Python is missing or broken:
# Test Python on target
ansible webserver -m raw -a "python3 --version"Bootstrap Python first:
```yaml - hosts: all gather_facts: false tasks: - name: Install Python raw: test -e /usr/bin/python3 || (apt update && apt install -y python3) changed_when: false
- name: Gather facts
- setup:
`
Handling Unreachable Hosts in Playbooks
Ignore unreachable hosts and continue:
- hosts: webservers
ignore_unreachable: yes
tasks:
- name: Deploy app
# ...Check for unreachable after the play:
```yaml - hosts: webservers tasks: - name: Deploy app # ...
- hosts: localhost
- tasks:
- - name: Report unreachable
- debug:
- msg: "Unreachable hosts: {{ ansible_play_hosts_all | difference(ansible_play_hosts) }}"
`
Set max failure percentage:
- hosts: webservers
max_fail_percentage: 30 # Stop if more than 30% fail
tasks:
- name: Deploy app
# ...Verification
After making changes:
```bash # Run with verbose output to see connection details ansible-playbook site.yml -vvv --limit webserver
# Check connection persistence ansible webserver -m ping -u deploy -f 1
# Run simple connectivity test ansible webserver -m wait_for_connection -a "timeout=60" ```
Prevention
- 1.Enable keepalives: Configure SSH keepalives in ansible.cfg
- 2.Use serial execution: Process hosts in batches
- 3.Add retry logic: Use
until,retries, andrescue - 4.Handle reboots: Use
wait_for_connectionafter reboots - 5.Monitor resources: Check disk and memory before deployment