# Fix AWS ELB Health Check Failing
Your load balancer shows targets as "unhealthy" or "OutOfService," and traffic isn't reaching your backend instances. ELB health checks are critical for ensuring traffic only goes to functioning backends, but when they fail unexpectedly, you need to quickly diagnose whether the issue is with the health check configuration, the backend application, or the network.
Diagnosis Commands
First, identify the type of load balancer and its health:
aws elbv2 describe-load-balancers \
--query 'LoadBalancers[*].[LoadBalancerName,Type,State.Code,DNSName]'For Classic Load Balancers:
aws elb describe-load-balancers \
--load-balancer-names my-classic-elb \
--query 'LoadBalancerDescriptions[*].[LoadBalancerName,Instances[*].InstanceId,HealthCheck]'Check target health for ALB/NLB:
aws elbv2 describe-target-groups \
--load-balancer-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/my-alb/1234567890123456 \
--query 'TargetGroups[*].[TargetGroupName,TargetGroupArn,HealthCheckPath,HealthCheckPort]'Get target health details:
aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \
--query 'TargetHealthDescriptions[*].[Target.Id,Target.Port,TargetHealth.State,TargetHealth.Reason,TargetHealth.Description]'Check health check configuration:
aws elbv2 describe-target-groups \
--target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \
--query 'TargetGroups[0].HealthCheckConfig'For Classic Load Balancers:
aws elb describe-load-balancers \
--load-balancer-names my-classic-elb \
--query 'LoadBalancerDescriptions[0].HealthCheck'Check instance health for Classic ELB:
aws elb describe-instance-health \
--load-balancer-name my-classic-elb \
--query 'InstanceStates[*].[InstanceId,State,Description]'Get CloudWatch metrics for health check failures:
aws cloudwatch get-metric-statistics \
--namespace AWS/ApplicationELB \
--metric-name TargetResponseCode \
--dimensions Name=TargetGroup,Value=my-targets Name=LoadBalancer,Value=my-alb \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics Sum \
--output tableCommon Causes and Solutions
Health Check Path Returns Wrong Status
The backend doesn't return 200 OK on the health check path:
# Check what path is being used
aws elbv2 describe-target-groups \
--target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \
--query 'TargetGroups[0].HealthCheckConfig.HealthCheckPath'Test the health check path directly on your backend:
```bash # SSH into target instance ssh -i my-key.pem ec2-user@target-ip
# Test health check endpoint curl -v http://localhost:8080/health curl -v http://localhost:8080/api/health ```
Or test from outside:
curl -v http://target-ip:8080/healthCommon issues:
404 Not Found: Wrong path in configuration
Update health check path:
aws elbv2 modify-target-group \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \
--health-check-path /api/health403 Forbidden: Authentication blocking health checks
Create a dedicated health endpoint that doesn't require auth:
```javascript // Express.js example app.get('/health', (req, res) => { res.status(200).send('OK'); });
// Or bypass auth middleware for health endpoint app.use('/health', (req, res, next) => { res.status(200).send('OK'); }); ```
500 Internal Server Error: Backend has errors
Check application logs:
ssh ec2-user@target-ip
sudo journalctl -u my-service -f
# or
tail -f /var/log/my-app/error.logHealth Check Timeout
Backend takes too long to respond:
Check timeout settings:
aws elbv2 describe-target-groups \
--target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \
--query 'TargetGroups[0].HealthCheckConfig.[HealthCheckTimeoutSeconds,HealthCheckIntervalSeconds]'Increase timeout if backend is slow:
aws elbv2 modify-target-group \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \
--health-check-timeout-seconds 10 \
--health-check-interval-seconds 30Optimize backend response time:
```python # Make health check endpoint lightweight @app.route('/health') def health(): # Don't do expensive operations return 'OK', 200
# Avoid database queries in health check if possible ```
Wrong Health Check Port
Health checks are hitting wrong port:
aws elbv2 describe-target-groups \
--target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \
--query 'TargetGroups[0].[Port,HealthCheckConfig.HealthCheckPort]'If HealthCheckPort is "traffic-port," health check uses the target's registration port. If it's a specific number, that port is used.
Update health check port:
aws elbv2 modify-target-group \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \
--health-check-port 8080Security Group Blocking Health Checks
ELB can't reach targets due to security group rules:
Check target security groups:
aws ec2 describe-instances \
--instance-ids i-1234567890abcdef0 \
--query 'Reservations[0].Instances[0].SecurityGroups[*].GroupId'Get security group rules:
aws ec2 describe-security-groups \
--group-ids sg-12345678 \
--query 'SecurityGroups[0].IpPermissions[*].[FromPort,ToPort,IpProtocol,IpRanges[*].CidrIp]'Allow ELB to reach targets:
```bash # Get ELB security group ELB_SG=$(aws elbv2 describe-load-balancers \ --names my-alb \ --query 'LoadBalancers[0].SecurityGroups[0]' \ --output text)
# Or for Classic ELB aws elb describe-load-balancers \ --load-balancer-names my-classic-elb \ --query 'LoadBalancerDescriptions[0].SourceSecurityGroup.GroupName'
# Allow inbound from ELB to target aws ec2 authorize-security-group-ingress \ --group-id sg-target \ --protocol tcp \ --port 8080 \ --source-group sg-elb ```
For Network Load Balancer, health checks come from NLB's private IPs. Allow those:
```bash # NLB health checks use the NLB subnet's IP range aws ec2 describe-subnets \ --subnet-ids subnet-12345 \ --query 'Subnets[0].CidrBlock'
# Allow from NLB subnet aws ec2 authorize-security-group-ingress \ --group-id sg-target \ --protocol tcp \ --port 8080 \ --cidr 10.0.1.0/24 ```
Threshold Settings
Too strict thresholds cause false failures:
aws elbv2 describe-target-groups \
--target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \
--query 'TargetGroups[0].HealthCheckConfig.[HealthyThresholdCount,UnhealthyThresholdCount]'Adjust thresholds:
aws elbv2 modify-target-group \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 5Target Not Responding
Backend service isn't running or listening:
Check if application is running:
```bash ssh ec2-user@target-ip
# Check if process is running ps aux | grep my-app systemctl status my-service
# Check if port is listening netstat -tlnp | grep 8080 ss -tlnp | grep 8080 ```
Start/restart the service:
sudo systemctl start my-service
sudo systemctl restart my-serviceCheck application logs:
sudo journalctl -u my-service -n 100
tail -100 /var/log/my-app/error.logHTTP vs HTTPS Mismatch
Health check protocol doesn't match server:
aws elbv2 describe-target-groups \
--target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \
--query 'TargetGroups[0].HealthCheckConfig.HealthCheckProtocol'If server only accepts HTTPS but health check uses HTTP:
aws elbv2 modify-target-group \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \
--health-check-protocol HTTPSTarget in Wrong Availability Zone
For cross-zone disabled load balancers, targets must be in zones where the LB has nodes:
```bash aws elbv2 describe-load-balancers \ --names my-alb \ --query 'LoadBalancers[0].AvailabilityZones[*].ZoneName'
aws ec2 describe-instances \ --instance-ids i-1234567890abcdef0 \ --query 'Reservations[0].Instances[0].[InstanceId,Placement.AvailabilityZone]' ```
Enable cross-zone load balancing:
aws elbv2 modify-load-balancer-attributes \
--load-balancer-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/my-alb/1234567890123456 \
--attributes Key=load_balancing.cross_zone.enabled,Value=trueInstance Registration Issues
For Classic ELB, check if instances are registered:
aws elb describe-load-balancers \
--load-balancer-names my-classic-elb \
--query 'LoadBalancerDescriptions[0].Instances[*].InstanceId'Register missing instances:
aws elb register-instances-with-load-balancer \
--load-balancer-name my-classic-elb \
--instances i-1234567890abcdef0Verification Steps
After making changes, verify health check status:
```bash # Wait for health checks to run sleep 60
# Check target health aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456 \ --query 'TargetHealthDescriptions[*].[Target.Id,TargetHealth.State]' ```
Test end-to-end traffic:
curl -v http://my-alb-123456.us-east-1.elb.amazonaws.com/Create a health check diagnostic script:
```bash #!/bin/bash TARGET_GROUP="arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/1234567890123456"
echo "ELB Health Check Diagnostics" echo "============================"
echo "1. Health Check Configuration:" aws elbv2 describe-target-groups \ --target-group-arns $TARGET_GROUP \ --query 'TargetGroups[0].HealthCheckConfig'
echo "" echo "2. Target Health Status:" aws elbv2 describe-target-health \ --target-group-arn $TARGET_GROUP \ --query 'TargetHealthDescriptions[*].[Target.Id,Target.Port,TargetHealth.State,TargetHealth.Reason]'
echo "" echo "3. Testing health endpoints directly:" for target in $(aws elbv2 describe-target-health --target-group-arn $TARGET_GROUP --query 'TargetHealthDescriptions[*].Target.Id' --output text); do echo "Testing $target..." curl -s -o /dev/null -w "HTTP %{http_code} in %{time_total}s\n" \ http://$target:8080/health done
echo "" echo "4. Recent 5xx responses:" aws cloudwatch get-metric-statistics \ --namespace AWS/ApplicationALB \ --metric-name HTTPCode_Target_5XX_Count \ --dimensions Name=TargetGroup,Value=my-targets \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 300 \ --statistics Sum ```
Set up CloudWatch alarm for unhealthy targets:
aws cloudwatch put-metric-alarm \
--alarm-name unhealthy-targets \
--alarm-description "Targets are unhealthy" \
--namespace AWS/ApplicationALB \
--metric-name UnHealthyTargetCount \
--dimensions Name=TargetGroup,Value=my-targets Name=LoadBalancer,Value=my-alb \
--statistic Average \
--period 60 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 3 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts