Introduction Cassandra bloom filters are probabilistic data structures that quickly determine whether a partition key exists in an SSTable. When the false positive rate is higher than configured—due to memory pressure, incorrect configuration, or SSTable proliferation—Cassandra performs unnecessary disk reads, degrading read latency and increasing I/O load.
Symptoms - Read latency higher than expected for point lookups - `nodetool tablestats` shows `Bloom filter false positives` much higher than expected - `Bloom filter false ratio` in tablestats exceeds the configured `bloom_filter_fp_chance` - High disk I/O on read operations that should be negative lookups - Memory pressure causing bloom filters to be evicted from memory
Common Causes - `bloom_filter_fp_chance` set too low (e.g., 0.001) without sufficient memory - Too many small SSTables from SizeTieredCompactionStrategy - Bloom filter off-heap memory evicted due to JVM pressure - Incorrect `bloom_filter_fp_chance` for the workload pattern - SSTable count growing without compaction keeping up
Step-by-Step Fix 1. **Check bloom filter statistics": ```bash nodetool tablestats mykeyspace.mytable # Look for: # Bloom filter false positives: 1234 # Bloom filter false ratio: 0.05 # Bloom filter space used: 45678 ```
- 1.**Adjust bloom filter false positive chance":
- 2.```sql
- 3.-- Check current setting
- 4.SELECT bloom_filter_fp_chance
- 5.FROM system_schema.tables
- 6.WHERE keyspace_name = 'mykeyspace' AND table_name = 'mytable';
-- Adjust to a more realistic value (0.01 is typical) ALTER TABLE mykeyspace.mytable WITH bloom_filter_fp_chance = 0.01;
-- This takes effect on new SSTables created after compaction ```
- 1.**Compact SSTables to rebuild bloom filters":
- 2.```bash
- 3.# Run major compaction to rebuild bloom filters with new settings
- 4.nodetool compact mykeyspace mytable
# Verify the new false positive rate nodetool tablestats mykeyspace.mytable ```
- 1.**Ensure sufficient off-heap memory for bloom filters":
- 2.```yaml
- 3.# /etc/cassandra/cassandra.yaml
- 4.# Ensure file_cache_size_in_mb is adequate
- 5.file_cache_size_in_mb: 512 # Default, increase for large datasets
# Also check JVM heap - bloom filters use off-heap memory # but are affected by overall memory pressure ```
- 1.**Monitor bloom filter effectiveness":
- 2.```bash
- 3.# Watch bloom filter stats in real-time
- 4.watch -n 10 'nodetool tablestats mykeyspace.mytable | grep -A3 "Bloom filter"'
- 5.
`