What's Actually Happening

Cassandra queries timeout before completing. Operations fail with timeout errors instead of returning results.

The Error You'll See

```bash $ cqlsh -e "SELECT * FROM keyspace.users WHERE user_id = 123"

QueryTimedOut: ERROR from server: code=1200 [Query timeout during read query at consistency LOCAL_QUORUM] message="Operation timed out - received only 0 responses." ```

Write timeout:

bash
WriteTimeout: Error from server: code=1100 [Write timeout during write at consistency LOCAL_QUORUM] message="Operation timed out - received only 1 responses."

Coordinator timeout:

bash
ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses]

No hosts available:

bash
NoHostAvailable: All host(s) tried for query failed

Why This Happens

  1. 1.High consistency level - Requires more replicas to respond
  2. 2.Replica unavailable - Not enough nodes online
  3. 3.Network latency - Slow inter-node communication
  4. 4.Large partitions - Query scanning too much data
  5. 5.Hot partitions - Uneven data distribution
  6. 6.Resource exhaustion - Node CPU or memory overloaded

Step 1: Check Cluster Status

```bash # Check cluster status: nodetool status

# Check specific keyspace: nodetool status keyspace_name

# Check datacenter: nodetool status | grep -A20 "Datacenter"

# Check ring: nodetool ring

# Check gossip info: nodetool gossipinfo

# Check if nodes are up: nodetool netstats

# Check load on nodes: nodetool info | grep -E "Load|Heap"

# Check joining nodes: nodetool join-status

# Verify replication factor: cqlsh -e "DESCRIBE KEYSPACE keyspace_name" ```

Step 2: Check Consistency Level

```sql -- Check current consistency: CONSISTENCY

-- Set lower consistency for debugging: CONSISTENCY ONE

-- Default consistency levels: -- ONE: One replica must respond -- QUORUM: Majority of replicas must respond -- LOCAL_QUORUM: Majority in local DC -- ALL: All replicas must respond

-- For read: SELECT * FROM users WHERE user_id = 123 USING CONSISTENCY ONE;

-- For write: INSERT INTO users (user_id, name) VALUES (123, 'John') USING CONSISTENCY ONE;

-- Check per-operation timeout: SELECT * FROM users USING TIMEOUT 30s;

-- Driver configuration for timeout: # Python: cluster = Cluster(['localhost'], connect_timeout=30) session = cluster.connect() session.default_timeout = 30

# Java: QueryOptions().setDefaultReadTimeoutMillis(30000) ```

Step 3: Check Replica Availability

```bash # Check if enough replicas available: nodetool status

# Check effective replication: nodetool describering keyspace_name

# Check replication factor: cqlsh -e "SELECT * FROM system_schema.keyspaces WHERE keyspace_name = 'keyspace_name'"

# Ensure nodes with replicas are up: # RF=3 with QUORUM requires 2 nodes # If RF=3 and only 1 node up, QUORUM fails

# Check for down nodes: nodetool status | grep -v "UN"

# Repair data if nodes were down: nodetool repair keyspace_name

# Check hinted handoff: nodetool tpstats | grep -i hint

# View pending hints: nodetool listhandoffs ```

Step 4: Optimize Query Performance

```sql -- Use partition key in WHERE clause: -- BAD: Full table scan SELECT * FROM users;

-- GOOD: Use partition key SELECT * FROM users WHERE user_id = 123;

-- Use clustering columns: SELECT * FROM users WHERE user_id = 123 AND created_at > '2024-01-01';

-- Allow filtering (use carefully): SELECT * FROM users WHERE user_id = 123 AND email = 'test@example.com' ALLOW FILTERING;

-- Use secondary index for non-partition queries: CREATE INDEX ON users (email); SELECT * FROM users WHERE email = 'test@example.com';

-- Use IN clause sparingly: -- BAD: SELECT * FROM users WHERE user_id IN (1, 2, 3, ..., 100); -- GOOD: Multiple queries or batch SELECT * FROM users WHERE user_id = 1; SELECT * FROM users WHERE user_id = 2;

-- Limit results: SELECT * FROM users WHERE user_id = 123 LIMIT 100;

-- Use paging: # Driver configuration: session.default_fetch_size = 100 ```

Step 5: Check Partition Size

```bash # Check partition size: nodetool tablestats keyspace_name.table_name

# Look for: # Maximum partition size # Average partition size # Large partitions cause timeouts

# Check for large partitions: nodetool cfstats keyspace_name.table_name | grep -i "partition"

# Use COMPACT STORAGE for small partitions: CREATE TABLE users ( user_id int PRIMARY KEY, name text ) WITH COMPACT STORAGE;

# For large partitions, consider: # 1. Bucketing data # 2. Time-series bucketing # 3. Reducing partition size

# Check SSTable size: nodetool cfhistograms keyspace_name table_name

# Estimate partition size: cqlsh -e "SELECT COUNT(*) FROM keyspace.table WHERE partition_key = 'value'" ```

Step 6: Check Network Latency

```bash # Test internode connectivity: nodetool netstats

# Check latency between nodes: nodetool latency

# Test TCP connectivity: nc -zv node1 7000 nc -zv node1 9042

# Check if nodes in same DC: nodetool status | grep -E "Datacenter|UN"

# Cross-DC queries have higher latency: # Use LOCAL_QUORUM instead of QUORUM

# Check for network errors: netstat -i

# Test bandwidth between nodes: iperf -c node1

# Check if nodes are on same rack: nodetool status

# Network interface stats: cat /proc/net/dev | grep eth0

# Firewall check: iptables -L -n | grep 7000 ```

Step 7: Check Resource Usage

```bash # Check node resources: nodetool info

# Check heap usage: nodetool info | grep "Heap Memory"

# Check CPU: top -bn1 | head -20

# Check disk I/O: iostat -x 1

# Check disk space: df -h /var/lib/cassandra

# Check compaction: nodetool compactionstats

# Check pending compactions: nodetool compactionstats | grep -i pending

# Check thread pools: nodetool tpstats

# Look for blocked threads: nodetool tpstats | grep -i blocked

# Check read/write latency: nodetool tablestats -- keyspace_name.table_name | grep -i latency ```

Step 8: Adjust Timeout Settings

```yaml # In cassandra.yaml:

# Read timeout (default 5000ms): read_request_timeout_in_ms: 10000

# Write timeout (default 2000ms): write_request_timeout_in_ms: 5000

# Range timeout (default 10000ms): range_request_timeout_in_ms: 20000

# Counter write timeout (default 5000ms): counter_write_request_timeout_in_ms: 10000

# Truncate timeout (default 60000ms): truncate_request_timeout_in_ms: 120000

# Cas timeout (default 5000ms): cas_contention_timeout_in_ms: 10000

# Request timeout (default 10000ms): request_timeout_in_ms: 20000

# Restart Cassandra after changes: systemctl restart cassandra ```

Step 9: Run Repair and Cleanup

```bash # Repair data inconsistencies: nodetool repair keyspace_name

# Full repair: nodetool repair --full

# Repair specific table: nodetool repair keyspace_name table_name

# Check repair status: nodetool repair-admin status

# Cleanup old data: nodetool cleanup keyspace_name

# After adding nodes: nodetool cleanup

# Rebuild if node was down: nodetool rebuild -- datacenter_name

# Decommission old nodes: nodetool decommission

# Check for tombstones: nodetool cfstats keyspace_name.table_name | grep -i tombstone

# Run garbage collection: nodetool garbagecollect keyspace_name table_name ```

Step 10: Cassandra Query Verification Script

```bash # Create verification script: cat << 'EOF' > /usr/local/bin/check-cassandra-query.sh #!/bin/bash

KEYSPACE=${1:-"my_keyspace"} TABLE=${2:-""}

echo "=== Cluster Status ===" nodetool status

echo "" echo "=== Keyspace Info ===" cqlsh -e "DESCRIBE KEYSPACE $KEYSPACE" 2>/dev/null | head -20

echo "" echo "=== Table Stats ===" if [ -n "$TABLE" ]; then nodetool tablestats $KEYSPACE.$TABLE 2>/dev/null | head -30 fi

echo "" echo "=== Current Consistency ===" cqlsh -e "CONSISTENCY" 2>/dev/null

echo "" echo "=== Gossip Info ===" nodetool gossipinfo 2>/dev/null | head -20

echo "" echo "=== Thread Pool Stats ===" nodetool tpstats 2>/dev/null | head -20

echo "" echo "=== Compaction Stats ===" nodetool compactionstats 2>/dev/null | head -10

echo "" echo "=== Heap Usage ===" nodetool info 2>/dev/null | grep -E "Heap|Load"

echo "" echo "=== Pending Tasks ===" nodetool tpstats 2>/dev/null | grep -E "ReadStage|MutationStage|Pending"

echo "" echo "=== Latency Stats ===" nodetool proxyhistograms 2>/dev/null | head -15

echo "" echo "=== Recommendations ===" echo "1. Use partition key in WHERE clause" echo "2. Lower consistency level if acceptable" echo "3. Check replica availability" echo "4. Monitor partition size" echo "5. Run repair if data inconsistent" echo "6. Increase timeout in cassandra.yaml" echo "7. Check for network latency between nodes" EOF

chmod +x /usr/local/bin/check-cassandra-query.sh

# Usage: /usr/local/bin/check-cassandra-query.sh my_keyspace my_table ```

Cassandra Query Timeout Checklist

CheckExpected
Cluster statusAll nodes UN
ReplicasEnough available for consistency
Partition keyUsed in WHERE clause
Timeout settingsAdequate for workload
Consistency levelAppropriate for replicas
NetworkLow latency between nodes
ResourcesCPU/memory not exhausted

Verify the Fix

```bash # After fixing Cassandra query timeout

# 1. Check cluster healthy nodetool status // All nodes UN

# 2. Test query cqlsh -e "SELECT * FROM keyspace.users WHERE user_id = 123 LIMIT 10" // Returns results

# 3. Check latency nodetool proxyhistograms // Acceptable latency

# 4. Verify consistency cqlsh -e "CONSISTENCY" // Appropriate level set

# 5. Test write cqlsh -e "INSERT INTO keyspace.users (user_id, name) VALUES (456, 'test')" // Write succeeds

# 6. Check for timeouts nodetool tpstats | grep -i timeout // No timeout errors ```

  • [Fix Cassandra Nodes Down](/articles/fix-cassandra-nodes-down)
  • [Fix MongoDB Slow Query](/articles/fix-mongodb-slow-query)
  • [Fix PostgreSQL Slow Query](/articles/fix-postgresql-slow-query)