Introduction Cassandra QUORUM consistency requires responses from a majority of replicas (N/2 + 1). When nodes become unavailable—due to network partitions, hardware failures, or maintenance—the quorum may not be achievable, causing all reads and writes to fail with `UnavailableException`.
Symptoms - `UnavailableException: Cannot achieve consistency level QUORUM` - Application reads and writes failing simultaneously - `nodetool status` shows one or more nodes as DOWN - Error occurs when RF=3 and 2+ nodes are down, or RF=2 and 1+ nodes are down - Cascading failures as application retries amplify the load
Common Causes - Replication factor too low for the number of nodes that can fail - Multiple nodes failing simultaneously (power outage, network partition) - Wrong consistency level for the availability requirements - Data center-specific outage affecting all replicas in that DC - Misconfigured replication strategy not distributing replicas across failure domains
Step-by-Step Fix 1. **Check replication factor and node status": ```sql -- Check keyspace replication DESCRIBE KEYSPACE mykeyspace;
-- Expected: WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3} ```
- 1.**Check which nodes are down":
- 2.```bash
- 3.nodetool status
- 4.# DN = Down, UN = Up Normal
- 5.# Calculate: if RF=3, you can tolerate 1 node down for QUORUM (need 2 of 3)
- 6.
` - 7.**Temporarily lower consistency level for availability":
- 8.```python
- 9.from cassandra.query import ConsistencyLevel
# Fall back to LOCAL_ONE when QUORUM is not achievable try: session.default_consistency_level = ConsistencyLevel.LOCAL_QUORUM session.execute(query) except Unavailable: session.default_consistency_level = ConsistencyLevel.LOCAL_ONE session.execute(query) ```
- 1.**Use Serial consistency for conditional updates":
- 2.```python
- 3.from cassandra.query import ConsistencyLevel
# For LWT (Lightweight Transactions), use SERIAL for consensus statement = SimpleStatement( "INSERT INTO mytable (id, value) VALUES (?, ?) IF NOT EXISTS", consistency_level=ConsistencyLevel.LOCAL_QUORUM, serial_consistency_level=ConsistencyLevel.LOCAL_SERIAL ) session.execute(statement, (1, 'value')) ```
- 1.**Restore nodes and run repair":
- 2.```bash
- 3.# Bring down nodes back online
- 4.sudo systemctl start cassandra
# Wait for them to join nodetool status
# Repair to synchronize nodetool repair mykeyspace -pr ```