Introduction When Redis reaches its `maxmemory` limit and evicts a frequently accessed key using `volatile-lru` or `allkeys-lru`, all subsequent requests for that key miss the cache simultaneously. This causes a thundering herd of requests hitting the database to recompute the same value, potentially overwhelming the backend.
Symptoms - Database CPU spikes to 100% immediately after Redis evictions - `INFO memory` shows `evicted_keys` increasing rapidly during traffic peaks - Application logs show hundreds of identical cache miss errors within seconds - Redis `instantaneous_ops_per_sec` drops as clients wait for backend regeneration - `INFO stats` shows `keyspace_misses` growing faster than `keyspace_hits`
Common Causes - `maxmemory` set too low relative to working set size - `allkeys-lru` eviction removing hot keys that have not been recently accessed - No cache regeneration coalescing—every cache miss triggers a database query - Large value sizes reducing effective cache capacity - TTL not set on cached items, forcing eviction as the only cleanup mechanism
Step-by-Step Fix 1. **Identify eviction patterns and affected keys**: ```bash # Monitor eviction rate in real-time redis-cli INFO stats | grep evicted_keys # Run every 5 seconds watch -n 5 'redis-cli INFO stats | grep -E "evicted_keys|keyspace_misses"' ```
- 1.Increase maxmemory or switch to a better eviction policy:
- 2.```bash
- 3.# Check current settings
- 4.redis-cli CONFIG GET maxmemory
- 5.redis-cli CONFIG GET maxmemory-policy
# Set a more appropriate policy for cache workloads redis-cli CONFIG SET maxmemory-policy volatile-lru redis-cli CONFIG SET maxmemory 4gb ```
- 1.Implement cache stampede protection with probabilistic early expiration:
- 2.```python
- 3.import random
- 4.import time
def get_with_stampede_protection(redis_client, key, ttl=300): value = redis_client.get(key) if value is not None: # 10% chance of early refresh when TTL is near expiry remaining_ttl = redis_client.ttl(key) if remaining_ttl < 30 and random.random() < 0.1: value = regenerate_and_cache(redis_client, key, ttl) return value
# Use distributed lock to prevent stampede lock_key = f"lock:{key}" if redis_client.set(lock_key, "1", nx=True, ex=10): try: return regenerate_and_cache(redis_client, key, ttl) finally: redis_client.delete(lock_key) else: # Wait briefly and retry time.sleep(0.1) return redis_client.get(key) ```
- 1.Implement request coalescing with Redis SET NX:
- 2.```python
- 3.def coalesced_cache_get(redis_client, key, compute_fn, ttl=300):
- 4.value = redis_client.get(key)
- 5.if value:
- 6.return value
lock_acquired = redis_client.set(f"compute:{key}", "1", nx=True, ex=30) if lock_acquired: result = compute_fn() redis_client.setex(key, ttl, result) redis_client.delete(f"compute:{key}") return result else: # Another request is computing, wait for it for _ in range(50): time.sleep(0.1) value = redis_client.get(key) if value: return value return compute_fn() # Fallback if timeout ```
- 1.Use Redis memory efficiently by compressing large values:
- 2.```python
- 3.import zlib
- 4.import json
def set_compressed(redis_client, key, data, ttl=300): compressed = zlib.compress(json.dumps(data).encode(), level=6) redis_client.setex(key, ttl, compressed)
def get_compressed(redis_client, key): data = redis_client.get(key) if data: return json.loads(zlib.decompress(data).decode()) return None ```