Introduction
When a popular cached object expires in Varnish, thousands of waiting requests may simultaneously attempt to fetch a fresh copy from the origin server. This "thundering herd" can overwhelm the origin, causing it to slow down or crash. If the origin crashes, Varnish cannot fill the cache, and subsequent requests also fail, creating a cascade. The origin may recover briefly, get hit again, and crash repeatedly in a fill loop.
Symptoms
- Origin server CPU and memory spike periodically
- Varnish logs show repeated cache misses for the same object:
`- VCL_call MISS
- BackendOpen 32 boot.default 10.0.1.100 8080
- BackendClose 32 boot.default
`- Origin server logs show burst of identical requests:
`- GET /popular-page - 200 - 4500ms (normally 50ms)
`- Varnish error log:
`- Backend connection failed (10.0.1.100:8080): Connection refused
`- Site goes down every time a popular cache entry expires (e.g., every hour on the hour)
Common Causes
- Popular object expires without grace period
- Varnish not configured for request coalescing (collapsing identical requests)
- Origin server cannot handle the burst of uncached requests
- TTL too short for high-traffic content
- No stale content serving while revalidating
Step-by-Step Fix
- 1.Enable request coalescing in Varnish VCL:
- 2.```vcl
- 3.# /etc/varnish/default.vcl
- 4.sub vcl_recv {
- 5.# Coalesce requests for the same URL
- 6.# If a fetch is already in progress, wait for it
- 7.if (req.method == "GET" || req.method == "HEAD") {
- 8.return (hash);
- 9.}
- 10.}
sub vcl_backend_response { # Set a generous grace period set beresp.grace = 1h;
# Set stale-while-revalidate set beresp.stale_while_revalidate = 300s; }
sub vcl_deliver { # Add debugging headers if (obj.hits > 0) { set resp.http.X-Cache = "HIT"; } else { set resp.http.X-Cache = "MISS"; } set resp.http.X-Cache-Hits = obj.hits; } ```
- 1.Configure Varnish to serve stale content while fetching fresh:
- 2.```vcl
- 3.sub vcl_recv {
- 4.# If backend is unhealthy, serve stale content
- 5.if (std.healthy(req.backend_hint)) {
- 6.# Healthy backend - allow up to 5 minutes of stale content while revalidating
- 7.if (req.http.Cache-Control ~ "no-cache") {
- 8.set req.hash_always_miss = true;
- 9.}
- 10.} else {
- 11.# Unhealthy backend - serve stale content up to grace period
- 12.set req.grace = 6h;
- 13.}
- 14.}
sub vcl_backend_response { # Grace period: serve stale for 1 hour after expiry set beresp.grace = 1h;
# If origin is slow, keep serving stale for up to 5 minutes set beresp.stale_while_revalidate = 300s; } ```
- 1.Implement a shield/semi-pass setup for multi-Varnish:
- 2.```vcl
- 3.# On edge Varnish nodes, set a backend that points to a shield Varnish
- 4.# The shield handles all origin fetches, preventing thundering herd
# Shield Varnish VCL backend origin { .host = "10.0.1.100"; .port = "8080"; .first_byte_timeout = 30s; .connect_timeout = 5s; .between_bytes_timeout = 10s; } ```
- 1.Monitor cache hit rate and detect thundering herd:
- 2.```bash
- 3.# Check Varnish statistics
- 4.varnishstat -1 | grep -E "MAIN.cache_hit|MAIN.cache_miss"
# Calculate hit rate varnishstat -1 -f MAIN.cache_hit,MAIN.cache_miss | \ awk '/cache_hit/{hit=$2} /cache_miss/{miss=$2} END{printf "Hit rate: %.2f%%\n", hit/(hit+miss)*100}'
# Watch for cache miss spikes watch -n 1 'varnishstat -1 | grep cache_miss' ```
- 1.Add origin server protection with rate limiting:
- 2.```vcl
- 3.# Limit concurrent requests to origin per URL
- 4.import std;
- 5.import semaphore;
sub vcl_recv { # Only allow 1 concurrent fetch per URL to origin if (req.method == "GET" && semaphore.lock(req.url, 1)) { # This request will fetch from origin } else { # Wait for the existing fetch to complete # Varnish will automatically coalesce } } ```
- 1.Implement exponential backoff for failed origin requests:
- 2.```vcl
- 3.sub vcl_backend_error {
- 4.# If origin fails, serve stale content
- 5.if (beresp.status >= 500 && beresp.status < 600) {
- 6.set beresp.grace = 24h; # Extended grace on error
- 7.return (deliver); # Deliver whatever stale content we have
- 8.}
# Retry with backoff if (bereq.retries < 3) { set beresp.ttl = 10s; # Short TTL after failure return (retry); } } ```
Prevention
- Always set
beresp.graceto at least 1 hour for popular content - Use
stale-while-revalidateto serve stale during revalidation - Monitor cache hit rate - alert if it drops below 90%
- Use a Varnish shield architecture for high-traffic sites
- Set appropriate TTL based on content update frequency
- Test cache expiration scenarios with load testing tools
- Configure origin server with adequate capacity for cache miss bursts