Fix AWS ALB Health Check Failing

Introduction

AWS Application Load Balancer (ALB) health checks monitor target health by sending periodic requests to configured endpoints. When health checks fail, targets are marked unhealthy and removed from the load balancing pool, potentially causing service disruption. Health check failures can result from endpoint misconfiguration, security group issues, timeout settings, or application-level errors.

Symptoms

Error messages in AWS console and CloudWatch:

bash

Target.HealthCheckFailed: Health checks failed with no response
Target is in unhealthy state
Health check timeout exceeded
HTTP status code 404 returned

Observable indicators: - Target status shows "unhealthy" in Target Group - CloudWatch metric UnHealthyHostCount increasing - HTTP 502/503 errors from ALB - Targets deregistered from routing pool - Health check path returning non-expected status

Common Causes

1.Health check path wrong - Path doesn't exist or returns different status
2.Security group blocking - ALB cannot reach target on health check port
3.Timeout too short - Application slower than health check timeout
4.Wrong port configured - Health check targeting incorrect port
5.Expected status mismatch - App returns different status than matcher expects
6.Instance under heavy load - Cannot respond within timeout
7.VPC routing issues - Network path between ALB and target blocked

Step-by-Step Fix

Step 1: Check Target Health Status

```bash # Describe target health aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789

# Get specific target details aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --targets Id=i-1234567890abcdef0,Port=80

# Check health check configuration aws elbv2 describe-target-groups \ --target-group-arns arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 ```

Step 2: Test Health Endpoint Directly

```bash # Test from an instance in the same VPC curl -v http://10.0.0.1:80/health

# Test with the exact health check path curl -v http://target-ip:80/health-check-path

# Check response timing curl -w "Connect: %{time_connect}s TTFB: %{time_starttransfer}s Total: %{time_total}s\n" \ http://target-ip/health

# Test from ALB subnet (if accessible) curl -v http://target-ip/health ```

Step 3: Check Security Groups

```bash # Get target instance security group aws ec2 describe-instances --instance-ids i-1234567890abcdef0 \ --query 'Reservations[0].Instances[0].SecurityGroups'

# Get ALB security group aws elbv2 describe-load-balancers --names my-alb \ --query 'LoadBalancers[0].SecurityGroups'

# Verify inbound rules allow health check port aws ec2 describe-security-groups --group-ids sg-123456789 \ --query 'SecurityGroups[0].IpPermissions'

# Check if health check port is allowed aws ec2 describe-security-groups --group-ids sg-target \ --filters Name=ip-permission.from-port,Values=80 ```

Step 4: Fix Target Group Health Check Configuration

bash

# Update health check settings
aws elbv2 modify-target-group \
    --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \
    --health-check-path /health \
    --health-check-interval-seconds 30 \
    --health-check-timeout-seconds 10 \
    --healthy-threshold-count 2 \
    --unhealthy-threshold-count 3 \
    --matcher HttpCode=200-299

```hcl # Terraform configuration resource "aws_lb_target_group" "app" { name = "app-target-group" port = 80 protocol = "HTTP" vpc_id = var.vpc_id

health_check { enabled = true healthy_threshold = 2 interval = 30 matcher = "200-299" path = "/health" port = "traffic-port" protocol = "HTTP" timeout = 10 unhealthy_threshold = 3 } } ```

Step 5: Fix Security Group Rules

```bash # Add inbound rule for health check from ALB aws ec2 authorize-security-group-ingress \ --group-id sg-target \ --protocol tcp \ --port 80 \ --source-group sg-alb

# Or allow from ALB subnet CIDR aws ec2 authorize-security-group-ingress \ --group-id sg-target \ --protocol tcp \ --port 80 \ --cidr 10.0.1.0/24 # ALB subnet CIDR ```

Step 6: Fix Application Health Endpoint

```bash # Ensure health endpoint returns correct response curl -v http://target-ip/health

# Expected: HTTP 200 OK # Fix application if it returns: # - 404 (endpoint doesn't exist) # - 301/302 (redirect instead of direct response) # - 500 (application error) # - Timeout (endpoint too slow) ```

python

# Example: Simple health endpoint
@app.route('/health')
def health():
    return jsonify({'status': 'healthy'}), 200

Step 7: Configure Proper Matcher

```bash # Accept broader status code range aws elbv2 modify-target-group \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --matcher HttpCode=200-399

# Or accept specific codes aws elbv2 modify-target-group \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --matcher HttpCode=200,201,202,204 ```

Step 8: Verify the Fix

bash # Check target health after changes aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --query 'TargetHealthDescriptions[?TargetHealth.State==healthy`]'

# Monitor CloudWatch metrics aws cloudwatch get-metric-statistics \ --namespace AWS/ApplicationELB \ --metric-name HealthyHostCount \ --dimensions Name=TargetGroup,Value=mytg Name=LoadBalancer,Value=my-alb \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Average

# Test through ALB curl -v http://my-alb.region.elb.amazonaws.com/ ```

Advanced Diagnosis

Check Health Check Logs

```bash # Enable ALB access logs aws elbv2 modify-load-balancer-attributes \ --load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/123456789 \ --attributes Key=access_logs.s3.enabled,Value=true Key=access_logs.s3.bucket,Value=my-bucket

# Query health check entries in logs aws s3 ls s3://my-bucket/AWSLogs/account/elasticloadbalancing/region/$(date +%Y/%m/%d)/

# Parse health check entries grep "ELB-HealthChecker" access_logs.csv | head -20 ```

Check Target Registration

```bash # Verify targets are properly registered aws elbv2 describe-targets \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789

# Check target port aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --query 'TargetHealthDescriptions[].Target.Port' ```

Use Lambda Health Check

bash

# For complex health checks, use Lambda targets
aws elbv2 register-targets \
    --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \
    --targets Id=arn:aws:lambda:region:account:function:health-check

CloudWatch Alarms

bash

# Create alarm for unhealthy targets
aws cloudwatch put-metric-alarm \
    --alarm-name ALB-Unhealthy-Targets \
    --metric-name UnHealthyHostCount \
    --namespace AWS/ApplicationELB \
    --statistic Sum \
    --period 60 \
    --threshold 1 \
    --comparison-operator GreaterThanThreshold \
    --dimensions Name=TargetGroup,Value=mytg \
    --evaluation-periods 3 \
    --alarm-actions arn:aws:sns:region:account:alerts

Common Pitfalls

Health check returns redirect - 301/302 instead of 200 fails matcher
Security group from wrong source - Allowing 0.0.0.0/0 but not ALB SG
Wrong health check port - Port mismatch between config and target
Too aggressive thresholds - single failure marks unhealthy
Cross-zone issues - ALB in one zone, target in different zone
Protocol mismatch - HTTP health check on HTTPS port
Private DNS resolution - Target using private DNS not accessible

Best Practices

```hcl # Complete Terraform configuration resource "aws_lb_target_group" "app" { name = "app-tg" port = 80 protocol = "HTTP" vpc_id = var.vpc_id

health_check { enabled = true healthy_threshold = 2 interval = 30 matcher = "200-299" path = "/health" port = "traffic-port" protocol = "HTTP" timeout = 10 unhealthy_threshold = 3 }

deregistration_delay = 30

tags = { Name = "app-target-group" } }

# Security group for targets resource "aws_security_group_rule" "allow_alb_health_check" { type = "ingress" from_port = 80 to_port = 80 protocol = "tcp" source_security_group_id = aws_security_group.alb.id security_group_id = aws_security_group.app.id } ```

AWS ALB Target Unhealthy
HAProxy Health Check Failing
GCP Load Balancer Backend Down
Azure Load Balancer Health Probe

AWS ALB Health Check Failing

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Step 1: Check Target Health Status

Step 2: Test Health Endpoint Directly

Step 3: Check Security Groups

Step 4: Fix Target Group Health Check Configuration

Step 5: Fix Security Group Rules

Step 6: Fix Application Health Endpoint

Step 7: Configure Proper Matcher

Step 8: Verify the Fix

Advanced Diagnosis

Check Health Check Logs

Check Target Registration

Use Lambda Health Check

CloudWatch Alarms

Common Pitfalls

Best Practices

Related Issues

Share this guide

More Load Balancer Troubleshooting Guides

Azure Front Door Routing Rule Not Matching

Azure Front Door Backend Unavailable

Azure Application Gateway SSL Certificate Missing

Azure Application Gateway WAF Blocks Legitimate

Azure Application Gateway 502

Azure Load Balancer Outbound Rule Not Working