Home / Cassandra / Cassandra SSTable Corrupted on Disk After Unexpected Node Restart

Cassandra

Cassandra SSTable Corrupted on Disk After Unexpected Node Restart

How to detect and recover from Cassandra SSTable corruption on disk after unexpected node restarts.

Today3 min read

Abstract illustration for a troubleshooting knowledge base category.

Introduction Cassandra SSTables (Sorted String Tables) are immutable on-disk data files. When a node restarts unexpectedly—due to power loss, OOM kill, or hardware failure—SSTables being written at the time can become corrupted. Corrupted SSTables cause read errors, data inconsistency, and may prevent the node from starting.

Symptoms - `CorruptSSTableException` in Cassandra logs during startup - `SSTableReader` fails to open specific SSTable files - Read queries return errors for specific partition keys - `nodetool verify` reports `SSTable is corrupt` - Node starts but certain ranges are unavailable

Common Causes - Unclean shutdown during SSTable flush from memtable - Disk I/O error during write completing SSTable - Filesystem corruption affecting SSTable files - Hardware failure (bad disk sectors, controller issues) - Compression corruption in compressed SSTables

Step-by-Step Fix 1. **Identify corrupted SSTables": ```bash # Run verify on the affected table nodetool verify mykeyspace mytable

# Check the system log for specific corrupted files grep -i "corrupt|CorruptSSTable" /var/log/cassandra/system.log

# List SSTable files for the table ls -la /var/lib/cassandra/data/mykeyspace/mytable-*/ ```

1.**Use sstablescrub to rebuild corrupted SSTables":
2.```bash
3.# Stop Cassandra on the affected node
4.sudo systemctl stop cassandra

# Run scrub (rebuilds SSTables, discarding corrupted data) sstablescrub mykeyspace mytable

# Restart Cassandra sudo systemctl start cassandra

# Verify the repair nodetool verify mykeyspace mytable ```

1.**Use sstableofflinerelevel for severe corruption":
2.```bash
3.# Stop Cassandra
4.sudo systemctl stop cassandra

# Rebuild SSTables offline sstableofflinerelevel mykeyspace mytable

# Start Cassandra and verify sudo systemctl start cassandra nodetool verify mykeyspace mytable ```

1.**If data is lost, rebuild the node from other replicas":
2.```bash
3.# Clear the corrupted data
4.sudo systemctl stop cassandra
5.sudo rm -rf /var/lib/cassandra/data/mykeyspace/mytable-*

# Start Cassandra sudo systemctl start cassandra

# Rebuild from other replicas nodetool rebuild -- mykeyspace mytable ```

1.**Repair the entire keyspace after recovery":
2.```bash
3.nodetool repair mykeyspace -pr
4.`

Prevention - Use reliable storage with battery-backed write cache - Configure `commitlog_sync: periodic` with `commitlog_sync_period_in_ms: 10000` - Monitor disk health with SMART and filesystem checks - Use `nodetool verify` as part of regular maintenance - Ensure clean shutdown procedures with proper signal handling - Use RAID or replication at the storage level - Test disaster recovery procedures regularly with simulated disk failures