Introduction Redis executes Lua scripts atomically on its single-threaded event loop. If a script takes too long—whether due to large data processing, infinite loops, or complex computations—all other commands queue up and eventually time out. This creates a cascading failure across all connected clients.

Symptoms - All Redis commands suddenly start timing out with `ERR Lua script timed out` - `redis-cli CLIENT LIST` shows many clients in `cmd=eval` state waiting - `INFO clients` shows `blocked_clients` spiking - Redis logs show `BUSY Redis is busy running a script` - Application connection pools exhaust waiting for Redis responses

Common Causes - Lua script iterating over a large hash or sorted set without pagination - Unbounded `KEYS *` pattern matching in a production database with millions of keys - Recursive Lua script or infinite loop with no termination condition - Script performing O(n) operations on large data structures - `lua-time-limit` set too high (default 5000ms), allowing scripts to block for too long

Step-by-Step Fix 1. **Kill the currently running script**: ```bash # This kills the running script but allows read-only commands redis-cli SCRIPT KILL

# If the script already modified data, SCRIPT KILL won't work # You must shut down Redis with the shutdown nosave option redis-cli SHUTDOWN NOSAVE ```

  1. 1.Identify long-running scripts in logs:
  2. 2.```bash
  3. 3.grep -i "lua.*timeout|BUSY" /var/log/redis/redis-server.log
  4. 4.redis-cli SLOWLOG GET 25
  5. 5.`
  6. 6.Set appropriate Lua timeout limits:
  7. 7.```bash
  8. 8.redis-cli CONFIG SET lua-time-limit 1000
  9. 9.# Make persistent
  10. 10.echo "lua-time-limit 1000" >> /etc/redis/redis.conf
  11. 11.`
  12. 12.Rewrite the script to process data in batches:
  13. 13.```lua
  14. 14.-- BAD: processes entire hash at once
  15. 15.-- local data = redis.call('HGETALL', KEYS[1])

-- GOOD: use HSCAN for incremental processing local cursor = "0" local results = {} repeat local result = redis.call('HSCAN', KEYS[1], cursor, 'COUNT', 100) cursor = result[1] local items = result[2] for i = 1, #items, 2 do -- Process each field-value pair table.insert(results, {items[i], items[i+1]}) end until cursor == "0" return results ```

  1. 1.Move heavy computation out of Lua and into application code:
  2. 2.```python
  3. 3.# Instead of a Lua script that aggregates:
  4. 4.# BAD: lua_script = """local keys = redis.call('KEYS', 'user:*') ..."""

# GOOD: Use SCAN in application code def aggregate_user_data(redis_client): total = 0 cursor = 0 while True: cursor, keys = redis_client.scan(cursor=cursor, match='user:*', count=100) if keys: pipe = redis_client.pipeline() for key in keys: pipe.hget(key, 'score') scores = [int(s or 0) for s in pipe.execute()] total += sum(scores) if cursor == 0: break return total ```

  1. 1.Register pre-approved scripts to avoid repeated compilation overhead:
  2. 2.```bash
  3. 3.# Pre-load the script
  4. 4.redis-cli SCRIPT LOAD "$(cat /opt/scripts/cache_cleanup.lua)"
  5. 5.# Output: "sha1hash..."

# Execute by SHA redis-cli EVALSHA sha1hash 1 mykey ```

Prevention - Set `lua-time-limit` to 1000ms or less to fail fast - Never use `KEYS *` in production—always use `SCAN` patterns - Keep Lua scripts under 100 lines and avoid loops over unbounded data - Pre-register scripts with `SCRIPT LOAD` and call via `EVALSHA` - Use Redis 7.0+ `FUNCTION` feature with built-in execution time limits - Monitor `blocked_clients` and slow log for Lua-related entries - Load-test Lua scripts with production-sized data before deployment