Introduction

Redis failover often changes which node accepts writes. Applications that cache the old node address, reconnect to a replica, or ignore Sentinel and cluster discovery can continue sending writes to a read-only replica long after failover completed.

Symptoms

  • Applications log READONLY You can't write against a read only replica
  • Read traffic still works, but writes start failing after a failover or maintenance event
  • Restarting the app sometimes fixes the issue temporarily
  • Different app instances behave differently depending on when they reconnected

Common Causes

  • The application uses a replica endpoint instead of the primary endpoint
  • Sentinel or cluster discovery is misconfigured or disabled
  • DNS or connection pools keep stale addresses after failover
  • A managed Redis service promoted a new primary but the client never refreshed topology

Step-by-Step Fix

  1. 1.Identify which node the application is writing to
  2. 2.Check the connected host and confirm whether that node is primary or replica.
bash
redis-cli -h redis-replica.example.internal -p 6379 INFO replication
  1. 1.Verify the client points to the correct write endpoint
  2. 2.Managed services and Sentinel setups usually expose a dedicated primary endpoint or discovery path for writes.
bash
nslookup redis-primary.example.internal
nslookup redis-replica.example.internal
  1. 1.Clear stale pools or reconnect with topology discovery
  2. 2.A correct endpoint still fails if long-lived pools keep old replica connections alive.
bash
redis-cli -h redis-primary.example.internal SET healthcheck ok
  1. 1.Retest writes after forcing the client to refresh
  2. 2.Restart the app or reset the client pool if needed, then verify writes succeed on the current primary.
bash
redis-cli -h redis-primary.example.internal INFO replication

Prevention

  • Use the managed primary endpoint, Sentinel, or cluster discovery instead of pinning one node
  • Keep DNS TTL and client topology refresh settings appropriate for failover
  • Exercise failover in staging and confirm the app reconnects to the new primary
  • Monitor write errors separately from read availability during Redis maintenance