Fix AWS ALB 503 Target Connection Failed

Introduction

AWS Application Load Balancer (ALB) 503 Service Unavailable errors occur when the ALB cannot route requests to healthy targets. The error indicates all targets in the target group are unhealthy, connection attempts are failing, or the ALB itself is misconfigured. Unlike 502 Bad Gateway (which means the target responded with invalid data), 503 means no valid response was received from any target. Common causes include health check failures, security group blocking, target application crashes, or ALB listener/routing misconfiguration.

Symptoms

ALB returns HTTP 503 Service Unavailable to all requests
CloudWatch metric HTTPCode_ELB_503_Count spikes
Target group shows all targets as unhealthy or unused
ALB access logs show target_status_code: - (no target response)
Issue appears after deploy, security group change, Auto Scaling event, or region outage
Health check endpoint returns 200 when accessed directly but targets still marked unhealthy

Common Causes

All targets marked unhealthy due to health check failures
Security group blocking ALB health check traffic
Target group port doesn't match application listening port
Application not started or crashed on target instances
Health check path incorrect or requires authentication
Target registration pending or deregistration in progress
ALB subnet or routing misconfiguration

Step-by-Step Fix

### 1. Check target health status

Verify target group health:

```bash # Check target health via AWS CLI aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:REGION:ACCOUNT:targetgroup/TG-NAME/ID

# Output shows each target's state: # { # "TargetHealthDescriptions": [ # { # "Target": {"Id": "i-0abc123", "Port": 80}, # "HealthCheckPort": "80", # "TargetHealth": { # "State": "unhealthy", # "Reason": "Target.Failed Health Checks", # "Description": "Health checks failed" # } # } # ] # }

# Check via console # EC2 > Target Groups > [select your TG] > Targets tab # Look for "unhealthy" status and reason ```

Target health states: - initial: Health checks in progress - healthy: Passing health checks - unhealthy: Failing health checks - unused: Target not in AZ or deregistering - draining: Connection draining in progress

### 2. Check health check configuration

Verify health check settings match application:

```bash # Get target group health check config aws elbv2 describe-target-groups \ --names <target-group-name> \ --query 'TargetGroups[0].{ Protocol:Protocol, Port:Port, HealthCheckProtocol:HealthCheckProtocol, HealthCheckPort:HealthCheckPort, HealthCheckPath:HealthCheckPath, HealthCheckIntervalSeconds:HealthCheckIntervalSeconds, HealthCheckTimeoutSeconds:HealthCheckTimeoutSeconds, HealthyThresholdCount:HealthyThresholdCount, UnhealthyThresholdCount:UnhealthyThresholdCount, Matcher:Matcher }'

# Expected output: # { # "Protocol": "HTTP", # "Port": 80, # "HealthCheckProtocol": "HTTP", # "HealthCheckPort": "traffic-port", # "HealthCheckPath": "/health", # "HealthCheckIntervalSeconds": 30, # "HealthCheckTimeoutSeconds": 5, # "HealthyThresholdCount": 2, # "UnhealthyThresholdCount": 3, # "Matcher": {"HttpCode": "200"} # } ```

Common misconfigurations: - HealthCheckPath points to non-existent endpoint - Matcher.HttpCode expects 200 but app returns 204 or 301 - HealthCheckPort doesn't match application port - HealthCheckTimeoutSeconds too short for app response time

Update health check settings:

bash # Update health check path aws elbv2 modify-target-group \ --target-group-arn <target-group-arn> \ --health-check-path /healthz \ --health-check-port 8080 \ --health-check-interval-seconds 30 \ --health-check-timeout-seconds 5 \ --healthy-threshold-count 2 \ --unhealthy-threshold-count 3 \ --matcher HttpCode=200-299

### 3. Verify security group allows ALB traffic

Security groups must allow health check traffic:

```bash # Get ALB security group aws elbv2 describe-load-balancers \ --names <alb-name> \ --query 'LoadBalancers[0].SecurityGroups'

# Get target instance security group aws ec2 describe-instances \ --filters "Name=instance-state-name,Values=running" \ --query 'Reservations[*].Instances[*].[InstanceId,SecurityGroups[*].GroupId]'

# Check security group rules aws ec2 describe-security-groups \ --group-ids <target-sg-id> \ --query 'SecurityGroups[0].IpPermissions'

# Expected: Inbound rule allowing ALB security group or CIDR # { # "IpProtocol": "tcp", # "FromPort": 8080, # "ToPort": 8080, # "UserIdGroupPairs": [ # {"GroupId": "sg-alb-id"} # ] # } ```

Required security group rules:

``` Inbound Rules: - Type: Custom TCP - Port: <health-check-port> (e.g., 8080) - Source: <ALB-security-group-id> OR <VPC-CIDR>

Outbound Rules: - Type: All traffic (for health check responses) - Destination: 0.0.0.0/0 OR <ALB-CIDR> ```

Update security group:

bash # Add inbound rule for ALB aws ec2 authorize-security-group-ingress \ --group-id <target-sg-id> \ --protocol tcp \ --port 8080 \ --source-group <alb-sg-id>

### 4. Check ALB health check IP ranges

AWS uses specific IP ranges for health checks:

``` # ALB health check source IPs (by region) # US-East-1: 130.211.0.0/22 and 35.191.0.0/16 # US-West-2: Similar ranges

# Get current region's health check IPs # https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/health-checks.html

# Security group must allow these ranges if not using ALB SG as source ```

For Network Load Balancer (NLB) targets: - Health checks come from NLB private IP - Allow traffic from NLB subnet CIDR

### 5. Test health check endpoint directly

Verify application responds correctly:

```bash # SSH to target instance ssh ec2-user@<instance-ip>

# Test health endpoint locally curl -I http://localhost:8080/health

# Test from instance private IP curl -I http://<instance-private-ip>:8080/health

# Expected response: # HTTP/1.1 200 OK # Content-Type: application/json # Connection: keep-alive

# Check response time (must be < timeout) time curl -s http://localhost:8080/health

# Check application is listening on correct interface netstat -tlnp | grep :8080

# Expected: 0.0.0.0:8080 or ::8080 # Problematic: 127.0.0.1:8080 (localhost only) ```

### 6. Check application status on targets

Verify application is running:

```bash # Check via SSM Session Manager (no SSH needed) aws ssm start-session --target <instance-id>

# Check application process sudo systemctl status myapp

# Check application logs sudo journalctl -u myapp -n 50 --no-pager

# Check listening port sudo ss -tlnp | grep :8080

# Check for recent crashes sudo journalctl -u myapp --since "1 hour ago" | grep -iE "error|crash|fatal" ```

For containerized applications (ECS):

```bash # Check ECS task status aws ecs describe-tasks \ --cluster <cluster-name> \ --tasks <task-arn> \ --query 'tasks[0].{status:lastStatus,containers:containers}'

# Check container health aws ecs describe-tasks \ --cluster <cluster-name> \ --tasks <task-arn> \ --query 'tasks[0].containers[0].health' ```

### 7. Check ALB listener and routing rules

Verify ALB is configured correctly:

```bash # Get listener configuration aws elbv2 describe-listeners \ --load-balancer-arn <alb-arn>

# Expected: # { # "Protocol": "HTTP", # "Port": 80, # "DefaultActions": [ # { # "Type": "forward", # "TargetGroupArn": "arn:aws:..." # } # ] # }

# Get routing rules aws elbv2 describe-rules \ --listener-arn <listener-arn> ```

Common listener issues: - Default action points to wrong target group - Listener port doesn't match expected (80 vs 443) - HTTPS listener with invalid/expired certificate - Rule conditions too restrictive, no targets match

### 8. Check Auto Scaling health status

Auto Scaling may mark instances unhealthy:

```bash # Get Auto Scaling group instances aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-names <asg-name> \ --query 'AutoScalingGroups[0].Instances[]'

# Check instance health status aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-names <asg-name> \ --query 'AutoScalingGroups[0].Instances[*].{ InstanceId:InstanceId, HealthStatus:HealthStatus, LifecycleState:LifecycleState }'

# HealthStatus can be: # - Healthy: ELB and ASG health checks pass # - Unhealthy: ELB or ASG health check failed ```

If instances marked unhealthy:

```bash # Check ASG health check settings aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-names <asg-name> \ --query 'AutoScalingGroups[0].{ HealthCheckType:HealthCheckType, HealthCheckGracePeriod:HealthCheckGracePeriod }'

# HealthCheckType: # - EC2: Only EC2 status checks # - ELB: Uses ALB health checks (recommended) ```

### 9. Check ALB CloudWatch metrics

Monitor ALB health with metrics:

```bash # Get 503 count aws cloudwatch get-metric-statistics \ --namespace AWS/ApplicationELB \ --metric-name HTTPCode_ELB_503_Count \ --dimensions Name=LoadBalancer,Value=<alb-id> \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Sum

# Get unhealthy host count aws cloudwatch get-metric-statistics \ --namespace AWS/ApplicationELB \ --metric-name UnHealthyHostCount \ --dimensions Name=TargetGroup,Value=<tg-id> \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Average

# Get target response time aws cloudwatch get-metric-statistics \ --namespace AWS/ApplicationELB \ --metric-name TargetResponseTime \ --dimensions Name=TargetGroup,Value=<tg-id> \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Average ```

Key metrics to alert: - HTTPCode_ELB_503_Count > 0: Service unavailable - UnHealthyHostCount > 0: Targets failing health - TargetResponseTime > threshold: Slow responses

### 10. Check ALB access logs

Enable and analyze access logs:

```bash # Enable access logs aws elbv2 modify-load-balancer-attributes \ --load-balancer-arn <alb-arn> \ --attributes Key=access_logs.s3.enabled,Value=true \ Key=access_logs.s3.bucket,Value=<log-bucket-name>

# Parse access logs # Format: timestamp elb client:port target:port request_processing_time target_processing_time response_processing_time elb_status_code target_status_code aws s3 cp s3://<log-bucket>/AWSLogs/<account-id>/elasticloadbalancing/<region>/<date>/ . --recursive

# Find 503 errors zcat *.gz | awk '$10 == 503 {print}' | head -20

# Check target_status_code for 503s zcat *.gz | awk '$10 == 503 {print $11}' | sort | uniq -c

# target_status_code = - means no target responded # target_status_code = 0 means connection failed before response ```

Prevention

Configure health check endpoints that return quickly (< 2 seconds)
Set HealthCheckTimeoutSeconds to 3x p99 response time
Use HealthyThresholdCount: 2 and UnhealthyThresholdCount: 3
Enable access logs for troubleshooting
Set up CloudWatch alarms for UnHealthyHostCount and 503 errors
Configure Auto Scaling health checks to use ELB
Test health check behavior in staging before production
Document expected startup time and set HealthCheckGracePeriod

**502 Bad Gateway**: Target responded with invalid/empty response
**504 Gateway Timeout**: Target response exceeded timeout
**Connection Refused**: Security group or application not listening
**Target.Failed Health Checks**: All targets marked unhealthy

How to Fix AWS ALB 503 Target Connection Failed

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Related Errors

Share this guide