Fix Nginx Load Balancing Issues - Distribution and Failover Guide

# Nginx Load Balancing Issues

Load balancing should distribute traffic evenly across backends, but you see uneven distribution, failed backends still receiving traffic, or sessions breaking. Users get logged out, shopping carts disappear, or one server takes all the load while others sit idle.

Understanding Load Balancing Problems

Common load balancing issues:

1.Uneven distribution - Traffic not spread evenly
2.Failed backend still used - Down servers still get requests
3.Session persistence broken - Users lose session state
4.Slow failover - Dead servers not removed quickly
5.Backend health - Unhealthy backends receive traffic

Check upstream status: ```bash # Watch backend connections watch -n 1 'ss -tn | grep :3000 | wc -l'

# Check Nginx status if configured curl http://localhost/nginx_status ```

Common Cause 1: Round-Robin Not Distributing

Default round-robin should distribute evenly, but connections might persist.

Problematic config: ``nginx upstream backend { server backend1:3000; server backend2:3000; server backend3:3000; }

Diagnosis: ``bash # Count requests per backend from access log awk '{print $upstream_addr}' /var/log/nginx/access.log | sort | uniq -c

1.Causes of uneven distribution:
2.Keepalive connections bias traffic
3.Long-lived connections skew distribution
4.Weight not set appropriately

Solution: Adjust weights or use least_conn: ``nginx upstream backend { least_conn; # Use server with fewest connections server backend1:3000 weight=3; server backend2:3000 weight=2; server backend3:3000 weight=1; keepalive 32; }

Or use IP hash for stateful apps: ``nginx upstream backend { ip_hash; # Same client goes to same server server backend1:3000; server backend2:3000; }

Common Cause 2: Backend Health Checks Not Working

Nginx open source doesn't have active health checks, only passive.

Problem: ``nginx upstream backend { server backend1:3000; server backend2:3000; # No health checks - dead servers still used }

Passive health checks (open source): ``nginx upstream backend { server backend1:3000 max_fails=3 fail_timeout=30s; server backend2:3000 max_fails=3 fail_timeout=30s; }

After 3 failures in 30 seconds, Nginx marks the server as unavailable.

Active health checks (Nginx Plus): ``nginx upstream backend { server backend1:3000; server backend2:3000; health_check interval=5s fails=3 passes=2; }

Alternative for open source - use a health check endpoint: ```nginx # Fallback to error page upstream backend { server backend1:3000 max_fails=3 fail_timeout=30s; server backend2:3000 max_fails=3 fail_timeout=30s; }

# Monitor script removes dead servers # or use third-party modules like nginx_upstream_check_module ```

Common Cause 3: Session Persistence Issues

Stateless load balancing breaks sessions stored on individual servers.

Problem: User logged in on backend1, next request goes to backend2 where they're not logged in.

Solution 1: Sticky sessions with IP hash: ``nginx upstream backend { ip_hash; server backend1:3000; server backend2:3000; }

Limitation: All users from same IP go to same backend.

Solution 2: Sticky cookie (requires Nginx Plus): ``nginx upstream backend { server backend1:3000; server backend2:3000; sticky cookie srv_id expires=1h domain=example.com path=/; }

Solution 3: Session stickiness hash: ``nginx upstream backend { hash $cookie_jsessionid consistent; server backend1:3000; server backend2:3000; }

Solution 4: Store sessions externally: The best solution - store sessions in Redis or database, making any backend work: ``nginx upstream backend { least_conn; server backend1:3000; server backend2:3000; } # Configure app to use Redis for sessions

Common Cause 4: Backend Slow to Drain

When removing a backend, existing connections aren't drained gracefully.

Problem: ``nginx upstream backend { server backend1:3000; server backend2:3000 down; # Immediate cutoff }

Solution: Use drain mode: ``nginx upstream backend { server backend1:3000; server backend2:3000 drain; # No new connections, finish existing }

Or use graceful shutdown: ```nginx # In backend, handle SIGTERM gracefully # In Nginx, allow existing connections to finish upstream backend { server backend1:3000; server backend2:3000; keepalive 32; }

server { location / { proxy_pass http://backend; proxy_http_version 1.1; proxy_set_header Connection ""; } } ```

Common Cause 5: Connection Limits Not Set

Backends get overwhelmed by too many concurrent connections.

Solution: Set connection limits: ```nginx upstream backend { server backend1:3000 max_conns=100; server backend2:3000 max_conns=100; keepalive 32;

queue 100 timeout=30s; # Queue requests when all backends at max } ```

Common Cause 6: Backup Server Not Activating

Backup servers should take over when primaries fail, but don't.

Problematic config: ``nginx upstream backend { server backend1:3000; server backend2:3000 backup; # backup only activates if ALL non-backup servers are down }

Correct usage: ``nginx upstream backend { server backend1:3000 max_fails=3 fail_timeout=30s; server backend2:3000 max_fails=3 fail_timeout=30s; server backend3:3000 backup; # Only used when 1 and 2 are down }

Verify backup activation: ```bash # Check which backends are marked down curl -s http://localhost/api/5/nginx/1/upstreams # Nginx Plus API

# Or watch error logs for backend status changes tail -f /var/log/nginx/error.log | grep "upstream" ```

Common Cause 7: DNS Resolution Issues

Backend hostnames change IP but Nginx keeps old addresses.

Problem: ``nginx upstream backend { server api1.example.com:3000; server api2.example.com:3000; }

DNS is resolved once at startup.

Solution: Use resolver with variables: ```nginx resolver 8.8.8.8 valid=30s;

server { location / { set $upstream api.example.com; proxy_pass http://$upstream:3000; } } ```

Or use Nginx Plus with resolve parameter: ``nginx upstream backend { server api1.example.com:3000 resolve; server api2.example.com:3000 resolve; }

Common Cause 8: No Load Balancing Status Monitoring

Can't see which backends are up, down, or getting traffic.

Solution: Enable status page:

For open source Nginx: ``nginx server { listen 8080; location /nginx_status { stub_status on; allow 127.0.0.1; deny all; } }

For Nginx Plus: ```nginx server { location /api { api write=on; allow 127.0.0.1; deny all; } }

# View upstream status: curl http://localhost/api/6/http/upstreams ```

Common Cause 9: SSL Backends Not Working

Connecting to HTTPS backends fails.

Problem: ``nginx upstream backend { server api.example.com:443; } server { location / { proxy_pass http://backend; # Wrong - uses HTTP } }

Solution: ``nginx upstream backend { server api.example.com:443; } server { location / { proxy_pass https://backend; proxy_ssl_verify off; # If self-signed proxy_ssl_server_name on; # For SNI } }

Complete Working Configuration

```nginx upstream backend { least_conn;

server backend1:3000 weight=2 max_fails=3 fail_timeout=30s max_conns=100; server backend2:3000 weight=2 max_fails=3 fail_timeout=30s max_conns=100; server backend3:3000 backup;

keepalive 32; keepalive_timeout 60s; keepalive_requests 1000; }

server { listen 80; server_name example.com;

location / { proxy_pass http://backend; proxy_http_version 1.1; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Connection "";

proxy_connect_timeout 10s; proxy_send_timeout 60s; proxy_read_timeout 60s;

proxy_next_upstream error timeout http_502 http_503 http_504; proxy_next_upstream_tries 3; }

location /nginx_status { stub_status on; allow 127.0.0.1; deny all; } } ```

Verification Steps

1.Check upstream configuration:
2.```bash
3.sudo nginx -T | grep -A 20 "upstream"
4.`
5.Monitor backend distribution:
6.```bash
7.# Real-time connection count
8.watch 'ss -tn state established "( dport = :3000 or sport = :3000 )" | wc -l'
9.`
10.Test failover:
11.```bash
12.# Stop one backend
13.sudo systemctl stop backend1

# Make requests and verify they go to other backends for i in {1..10}; do curl -s http://example.com/health; done

# Check Nginx error log tail -f /var/log/nginx/error.log ```

1.Check health status (Nginx Plus):
2.```bash
3.curl http://localhost/api/6/http/upstreams/backend
4.`

Quick Reference

Issue	Cause	Fix
Uneven distribution	Round-robin + keepalive	Use least_conn or weights
Sessions break	No persistence	Use ip_hash or external sessions
Dead servers used	No health checks	Add max_fails and fail_timeout
Slow failover	Passive checks only	Reduce fail_timeout
Connection overload	No limits	Set max_conns per server
DNS changes missed	Static resolution	Use resolver directive
Backup not used	All primaries must fail	Verify fail counts

Load balancing issues often stem from default configurations not matching application needs. Consider session requirements, failure scenarios, and traffic patterns when tuning upstream settings.

How to Fix Nginx Load Balancing Issues

Understanding Load Balancing Problems

Common Cause 1: Round-Robin Not Distributing

Common Cause 2: Backend Health Checks Not Working

Common Cause 3: Session Persistence Issues

Common Cause 4: Backend Slow to Drain

Common Cause 5: Connection Limits Not Set

Common Cause 6: Backup Server Not Activating

Common Cause 7: DNS Resolution Issues

Common Cause 8: No Load Balancing Status Monitoring

Common Cause 9: SSL Backends Not Working

Complete Working Configuration

Verification Steps

Quick Reference

Share this guide

More Nginx Troubleshooting Guides

Nginx Lua Script Error

Nginx Mail Proxy Auth Failed

Nginx Stream Proxy Connection Failed

Nginx gRPC Proxy Error

Nginx WebSocket Proxy Failed

Nginx HTTP/2 Not Working