Introduction Cassandra QUORUM consistency requires responses from a majority of replicas (N/2 + 1). When nodes become unavailable—due to network partitions, hardware failures, or maintenance—the quorum may not be achievable, causing all reads and writes to fail with `UnavailableException`.

Symptoms - `UnavailableException: Cannot achieve consistency level QUORUM` - Application reads and writes failing simultaneously - `nodetool status` shows one or more nodes as DOWN - Error occurs when RF=3 and 2+ nodes are down, or RF=2 and 1+ nodes are down - Cascading failures as application retries amplify the load

Common Causes - Replication factor too low for the number of nodes that can fail - Multiple nodes failing simultaneously (power outage, network partition) - Wrong consistency level for the availability requirements - Data center-specific outage affecting all replicas in that DC - Misconfigured replication strategy not distributing replicas across failure domains

Step-by-Step Fix 1. **Check replication factor and node status": ```sql -- Check keyspace replication DESCRIBE KEYSPACE mykeyspace;

-- Expected: WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3} ```

  1. 1.**Check which nodes are down":
  2. 2.```bash
  3. 3.nodetool status
  4. 4.# DN = Down, UN = Up Normal
  5. 5.# Calculate: if RF=3, you can tolerate 1 node down for QUORUM (need 2 of 3)
  6. 6.`
  7. 7.**Temporarily lower consistency level for availability":
  8. 8.```python
  9. 9.from cassandra.query import ConsistencyLevel

# Fall back to LOCAL_ONE when QUORUM is not achievable try: session.default_consistency_level = ConsistencyLevel.LOCAL_QUORUM session.execute(query) except Unavailable: session.default_consistency_level = ConsistencyLevel.LOCAL_ONE session.execute(query) ```

  1. 1.**Use Serial consistency for conditional updates":
  2. 2.```python
  3. 3.from cassandra.query import ConsistencyLevel

# For LWT (Lightweight Transactions), use SERIAL for consensus statement = SimpleStatement( "INSERT INTO mytable (id, value) VALUES (?, ?) IF NOT EXISTS", consistency_level=ConsistencyLevel.LOCAL_QUORUM, serial_consistency_level=ConsistencyLevel.LOCAL_SERIAL ) session.execute(statement, (1, 'value')) ```

  1. 1.**Restore nodes and run repair":
  2. 2.```bash
  3. 3.# Bring down nodes back online
  4. 4.sudo systemctl start cassandra

# Wait for them to join nodetool status

# Repair to synchronize nodetool repair mykeyspace -pr ```

Prevention - Set replication factor to at least 3 for production keyspaces - Use NetworkTopologyStrategy to distribute replicas across failure domains - Implement dynamic consistency level selection based on cluster health - Monitor node availability and alert before QUORUM becomes at risk - Test failure scenarios: what happens when 1, 2, or N-1 nodes go down - Use LOCAL_QUORUM instead of QUORUM for multi-DC deployments - Document the minimum number of nodes required for each consistency level