# Nginx Load Balancing Issues
Load balancing should distribute traffic evenly across backends, but you see uneven distribution, failed backends still receiving traffic, or sessions breaking. Users get logged out, shopping carts disappear, or one server takes all the load while others sit idle.
Understanding Load Balancing Problems
Common load balancing issues:
- 1.Uneven distribution - Traffic not spread evenly
- 2.Failed backend still used - Down servers still get requests
- 3.Session persistence broken - Users lose session state
- 4.Slow failover - Dead servers not removed quickly
- 5.Backend health - Unhealthy backends receive traffic
Check upstream status: ```bash # Watch backend connections watch -n 1 'ss -tn | grep :3000 | wc -l'
# Check Nginx status if configured curl http://localhost/nginx_status ```
Common Cause 1: Round-Robin Not Distributing
Default round-robin should distribute evenly, but connections might persist.
Problematic config:
``nginx
upstream backend {
server backend1:3000;
server backend2:3000;
server backend3:3000;
}
Diagnosis:
``bash
# Count requests per backend from access log
awk '{print $upstream_addr}' /var/log/nginx/access.log | sort | uniq -c
- 1.Causes of uneven distribution:
- 2.Keepalive connections bias traffic
- 3.Long-lived connections skew distribution
- 4.Weight not set appropriately
Solution: Adjust weights or use least_conn:
``nginx
upstream backend {
least_conn; # Use server with fewest connections
server backend1:3000 weight=3;
server backend2:3000 weight=2;
server backend3:3000 weight=1;
keepalive 32;
}
Or use IP hash for stateful apps:
``nginx
upstream backend {
ip_hash; # Same client goes to same server
server backend1:3000;
server backend2:3000;
}
Common Cause 2: Backend Health Checks Not Working
Nginx open source doesn't have active health checks, only passive.
Problem:
``nginx
upstream backend {
server backend1:3000;
server backend2:3000;
# No health checks - dead servers still used
}
Passive health checks (open source):
``nginx
upstream backend {
server backend1:3000 max_fails=3 fail_timeout=30s;
server backend2:3000 max_fails=3 fail_timeout=30s;
}
After 3 failures in 30 seconds, Nginx marks the server as unavailable.
Active health checks (Nginx Plus):
``nginx
upstream backend {
server backend1:3000;
server backend2:3000;
health_check interval=5s fails=3 passes=2;
}
Alternative for open source - use a health check endpoint: ```nginx # Fallback to error page upstream backend { server backend1:3000 max_fails=3 fail_timeout=30s; server backend2:3000 max_fails=3 fail_timeout=30s; }
# Monitor script removes dead servers # or use third-party modules like nginx_upstream_check_module ```
Common Cause 3: Session Persistence Issues
Stateless load balancing breaks sessions stored on individual servers.
Problem: User logged in on backend1, next request goes to backend2 where they're not logged in.
Solution 1: Sticky sessions with IP hash:
``nginx
upstream backend {
ip_hash;
server backend1:3000;
server backend2:3000;
}
Limitation: All users from same IP go to same backend.
Solution 2: Sticky cookie (requires Nginx Plus):
``nginx
upstream backend {
server backend1:3000;
server backend2:3000;
sticky cookie srv_id expires=1h domain=example.com path=/;
}
Solution 3: Session stickiness hash:
``nginx
upstream backend {
hash $cookie_jsessionid consistent;
server backend1:3000;
server backend2:3000;
}
Solution 4: Store sessions externally:
The best solution - store sessions in Redis or database, making any backend work:
``nginx
upstream backend {
least_conn;
server backend1:3000;
server backend2:3000;
}
# Configure app to use Redis for sessions
Common Cause 4: Backend Slow to Drain
When removing a backend, existing connections aren't drained gracefully.
Problem:
``nginx
upstream backend {
server backend1:3000;
server backend2:3000 down; # Immediate cutoff
}
Solution: Use drain mode:
``nginx
upstream backend {
server backend1:3000;
server backend2:3000 drain; # No new connections, finish existing
}
Or use graceful shutdown: ```nginx # In backend, handle SIGTERM gracefully # In Nginx, allow existing connections to finish upstream backend { server backend1:3000; server backend2:3000; keepalive 32; }
server { location / { proxy_pass http://backend; proxy_http_version 1.1; proxy_set_header Connection ""; } } ```
Common Cause 5: Connection Limits Not Set
Backends get overwhelmed by too many concurrent connections.
Solution: Set connection limits: ```nginx upstream backend { server backend1:3000 max_conns=100; server backend2:3000 max_conns=100; keepalive 32;
queue 100 timeout=30s; # Queue requests when all backends at max } ```
Common Cause 6: Backup Server Not Activating
Backup servers should take over when primaries fail, but don't.
Problematic config:
``nginx
upstream backend {
server backend1:3000;
server backend2:3000 backup;
# backup only activates if ALL non-backup servers are down
}
Correct usage:
``nginx
upstream backend {
server backend1:3000 max_fails=3 fail_timeout=30s;
server backend2:3000 max_fails=3 fail_timeout=30s;
server backend3:3000 backup; # Only used when 1 and 2 are down
}
Verify backup activation: ```bash # Check which backends are marked down curl -s http://localhost/api/5/nginx/1/upstreams # Nginx Plus API
# Or watch error logs for backend status changes tail -f /var/log/nginx/error.log | grep "upstream" ```
Common Cause 7: DNS Resolution Issues
Backend hostnames change IP but Nginx keeps old addresses.
Problem:
``nginx
upstream backend {
server api1.example.com:3000;
server api2.example.com:3000;
}
DNS is resolved once at startup.
Solution: Use resolver with variables: ```nginx resolver 8.8.8.8 valid=30s;
server { location / { set $upstream api.example.com; proxy_pass http://$upstream:3000; } } ```
Or use Nginx Plus with resolve parameter:
``nginx
upstream backend {
server api1.example.com:3000 resolve;
server api2.example.com:3000 resolve;
}
Common Cause 8: No Load Balancing Status Monitoring
Can't see which backends are up, down, or getting traffic.
Solution: Enable status page:
For open source Nginx:
``nginx
server {
listen 8080;
location /nginx_status {
stub_status on;
allow 127.0.0.1;
deny all;
}
}
For Nginx Plus: ```nginx server { location /api { api write=on; allow 127.0.0.1; deny all; } }
# View upstream status: curl http://localhost/api/6/http/upstreams ```
Common Cause 9: SSL Backends Not Working
Connecting to HTTPS backends fails.
Problem:
``nginx
upstream backend {
server api.example.com:443;
}
server {
location / {
proxy_pass http://backend; # Wrong - uses HTTP
}
}
Solution:
``nginx
upstream backend {
server api.example.com:443;
}
server {
location / {
proxy_pass https://backend;
proxy_ssl_verify off; # If self-signed
proxy_ssl_server_name on; # For SNI
}
}
Complete Working Configuration
```nginx upstream backend { least_conn;
server backend1:3000 weight=2 max_fails=3 fail_timeout=30s max_conns=100; server backend2:3000 weight=2 max_fails=3 fail_timeout=30s max_conns=100; server backend3:3000 backup;
keepalive 32; keepalive_timeout 60s; keepalive_requests 1000; }
server { listen 80; server_name example.com;
location / { proxy_pass http://backend; proxy_http_version 1.1; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Connection "";
proxy_connect_timeout 10s; proxy_send_timeout 60s; proxy_read_timeout 60s;
proxy_next_upstream error timeout http_502 http_503 http_504; proxy_next_upstream_tries 3; }
location /nginx_status { stub_status on; allow 127.0.0.1; deny all; } } ```
Verification Steps
- 1.Check upstream configuration:
- 2.```bash
- 3.sudo nginx -T | grep -A 20 "upstream"
- 4.
` - 5.Monitor backend distribution:
- 6.```bash
- 7.# Real-time connection count
- 8.watch 'ss -tn state established "( dport = :3000 or sport = :3000 )" | wc -l'
- 9.
` - 10.Test failover:
- 11.```bash
- 12.# Stop one backend
- 13.sudo systemctl stop backend1
# Make requests and verify they go to other backends for i in {1..10}; do curl -s http://example.com/health; done
# Check Nginx error log tail -f /var/log/nginx/error.log ```
- 1.Check health status (Nginx Plus):
- 2.```bash
- 3.curl http://localhost/api/6/http/upstreams/backend
- 4.
`
Quick Reference
| Issue | Cause | Fix |
|---|---|---|
| Uneven distribution | Round-robin + keepalive | Use least_conn or weights |
| Sessions break | No persistence | Use ip_hash or external sessions |
| Dead servers used | No health checks | Add max_fails and fail_timeout |
| Slow failover | Passive checks only | Reduce fail_timeout |
| Connection overload | No limits | Set max_conns per server |
| DNS changes missed | Static resolution | Use resolver directive |
| Backup not used | All primaries must fail | Verify fail counts |
Load balancing issues often stem from default configurations not matching application needs. Consider session requirements, failure scenarios, and traffic patterns when tuning upstream settings.