Your playbook runs fine for several tasks, then suddenly a host becomes unreachable. The initial connection worked, but subsequent tasks fail. This intermittent unreachability is frustrating because it can happen mid-deployment, leaving your systems in an inconsistent state.

The Error

bash
fatal: [webserver]: UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.50 port 22: Connection timed out",
    "unreachable": true
}

Or during playbook run:

bash
TASK [Deploy application] *******************************************************
fatal: [webserver]: UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: Shared connection to 192.168.1.50 closed.",
    "unreachable": true
}

Quick Diagnosis

Check the host status immediately:

```bash # Basic connectivity ping -c 3 192.168.1.50

# SSH connection test ssh -o ConnectTimeout=10 user@192.168.1.50 "echo alive"

# Check if SSH is listening nc -zv 192.168.1.50 22 -w 5 ```

For Ansible-specific testing:

```bash # Test with ping module ansible webserver -m ping -u deploy

# Test with wait_for to check port availability ansible webserver -m wait_for -a "host=192.168.1.50 port=22 timeout=30" ```

Common Causes and Fixes

Network Connectivity Issues

The host might have intermittent network problems.

Add retry logic to your playbook:

```yaml - hosts: webservers tasks: - name: Deploy application block: - name: Copy files copy: src: ./app/ dest: /opt/app/ rescue: - name: Wait for host to recover wait_for_connection: delay: 10 timeout: 300

  • name: Retry copy
  • copy:
  • src: ./app/
  • dest: /opt/app/
  • `

**Or use the until loop:**

yaml
- name: Ensure service is running
  service:
    name: myapp
    state: started
  register: result
  until: result is success
  retries: 3
  delay: 10

SSH Connection Drops Mid-Play

Long-running tasks or network issues can drop SSH connections.

Enable SSH keepalives in ansible.cfg:

ini
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ServerAliveInterval=30 -o ServerAliveCountMax=3
pipelining = True

**Or in your SSH config (~/.ssh/config):**

bash
Host *
    ServerAliveInterval 30
    ServerAliveCountMax 3
    TCPKeepAlive yes

Target Host Reboot or Restart

If a task triggers a reboot, subsequent tasks fail.

Handle reboots properly:

yaml
- name: Reboot server
  reboot:
    msg: "Rebooting for kernel update"
    connect_timeout: 5
    reboot_timeout: 300
    pre_reboot_delay: 0
    post_reboot_delay: 30

Or manually with wait_for_connection:

```yaml - name: Restart server shell: sleep 2 && shutdown -r now "Ansible reboot" async: 1 poll: 0 become: yes

  • name: Wait for server to restart
  • wait_for_connection:
  • delay: 30
  • timeout: 300
  • `

The async: 1 and poll: 0 pattern tells Ansible to fire-and-forget the reboot command, preventing it from hanging on the closed connection.

Firewall Changes Blocking SSH

If a task modifies firewall rules and locks itself out.

Use check mode first:

yaml
- name: Check firewall rules
  iptables:
    chain: INPUT
    protocol: tcp
    destination_port: 22
    jump: ACCEPT
  check_mode: yes
  diff: yes

Add a rescue to restore access:

yaml
- name: Configure firewall
  block:
    - name: Add firewall rule
      iptables:
        chain: INPUT
        protocol: tcp
        destination_port: 80
        jump: ACCEPT
  rescue:
    - name: Emergency SSH restore
      local_action:
        module: shell
        cmd: ssh {{ ansible_user }}@{{ ansible_host }} "iptables -A INPUT -p tcp --dport 22 -j ACCEPT"

Target Resource Exhaustion

The host runs out of memory or file descriptors, causing SSH to fail.

Check target health first:

```yaml - name: Check available memory shell: free -m | grep Mem | awk '{print $7}' register: free_memory

  • name: Fail if low memory
  • fail:
  • msg: "Less than 100MB free memory"
  • when: free_memory.stdout | int < 100
  • name: Check disk space
  • shell: df / | tail -1 | awk '{print $5}' | tr -d '%'
  • register: disk_usage
  • name: Fail if disk full
  • fail:
  • msg: "Disk usage above 90%"
  • when: disk_usage.stdout | int > 90
  • `

SSH Session Limits

Too many parallel connections can overwhelm the target.

Limit parallelism:

```bash # Run with fewer forks ansible-playbook site.yml --forks 5

# Or in ansible.cfg [defaults] forks = 5 ```

Use serial execution for rolling updates:

yaml
- hosts: webservers
  serial: 2  # Process 2 hosts at a time
  tasks:
    - name: Deploy app
      # ...

Target Python Missing or Broken

Ansible requires Python on the target. If Python is missing or broken:

bash
# Test Python on target
ansible webserver -m raw -a "python3 --version"

Bootstrap Python first:

```yaml - hosts: all gather_facts: false tasks: - name: Install Python raw: test -e /usr/bin/python3 || (apt update && apt install -y python3) changed_when: false

  • name: Gather facts
  • setup:
  • `

Handling Unreachable Hosts in Playbooks

Ignore unreachable hosts and continue:

yaml
- hosts: webservers
  ignore_unreachable: yes
  tasks:
    - name: Deploy app
      # ...

Check for unreachable after the play:

```yaml - hosts: webservers tasks: - name: Deploy app # ...

  • hosts: localhost
  • tasks:
  • - name: Report unreachable
  • debug:
  • msg: "Unreachable hosts: {{ ansible_play_hosts_all | difference(ansible_play_hosts) }}"
  • `

Set max failure percentage:

yaml
- hosts: webservers
  max_fail_percentage: 30  # Stop if more than 30% fail
  tasks:
    - name: Deploy app
      # ...

Verification

After making changes:

```bash # Run with verbose output to see connection details ansible-playbook site.yml -vvv --limit webserver

# Check connection persistence ansible webserver -m ping -u deploy -f 1

# Run simple connectivity test ansible webserver -m wait_for_connection -a "timeout=60" ```

Prevention

  1. 1.Enable keepalives: Configure SSH keepalives in ansible.cfg
  2. 2.Use serial execution: Process hosts in batches
  3. 3.Add retry logic: Use until, retries, and rescue
  4. 4.Handle reboots: Use wait_for_connection after reboots
  5. 5.Monitor resources: Check disk and memory before deployment