Introduction
AWS Application Load Balancer (ALB) health checks monitor target health by sending periodic requests to configured endpoints. When health checks fail, targets are marked unhealthy and removed from the load balancing pool, potentially causing service disruption. Health check failures can result from endpoint misconfiguration, security group issues, timeout settings, or application-level errors.
Symptoms
Error messages in AWS console and CloudWatch:
Target.HealthCheckFailed: Health checks failed with no response
Target is in unhealthy state
Health check timeout exceeded
HTTP status code 404 returnedObservable indicators:
- Target status shows "unhealthy" in Target Group
- CloudWatch metric UnHealthyHostCount increasing
- HTTP 502/503 errors from ALB
- Targets deregistered from routing pool
- Health check path returning non-expected status
Common Causes
- 1.Health check path wrong - Path doesn't exist or returns different status
- 2.Security group blocking - ALB cannot reach target on health check port
- 3.Timeout too short - Application slower than health check timeout
- 4.Wrong port configured - Health check targeting incorrect port
- 5.Expected status mismatch - App returns different status than matcher expects
- 6.Instance under heavy load - Cannot respond within timeout
- 7.VPC routing issues - Network path between ALB and target blocked
Step-by-Step Fix
Step 1: Check Target Health Status
```bash # Describe target health aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789
# Get specific target details aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --targets Id=i-1234567890abcdef0,Port=80
# Check health check configuration aws elbv2 describe-target-groups \ --target-group-arns arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 ```
Step 2: Test Health Endpoint Directly
```bash # Test from an instance in the same VPC curl -v http://10.0.0.1:80/health
# Test with the exact health check path curl -v http://target-ip:80/health-check-path
# Check response timing curl -w "Connect: %{time_connect}s TTFB: %{time_starttransfer}s Total: %{time_total}s\n" \ http://target-ip/health
# Test from ALB subnet (if accessible) curl -v http://target-ip/health ```
Step 3: Check Security Groups
```bash # Get target instance security group aws ec2 describe-instances --instance-ids i-1234567890abcdef0 \ --query 'Reservations[0].Instances[0].SecurityGroups'
# Get ALB security group aws elbv2 describe-load-balancers --names my-alb \ --query 'LoadBalancers[0].SecurityGroups'
# Verify inbound rules allow health check port aws ec2 describe-security-groups --group-ids sg-123456789 \ --query 'SecurityGroups[0].IpPermissions'
# Check if health check port is allowed aws ec2 describe-security-groups --group-ids sg-target \ --filters Name=ip-permission.from-port,Values=80 ```
Step 4: Fix Target Group Health Check Configuration
# Update health check settings
aws elbv2 modify-target-group \
--target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \
--health-check-path /health \
--health-check-interval-seconds 30 \
--health-check-timeout-seconds 10 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3 \
--matcher HttpCode=200-299```hcl # Terraform configuration resource "aws_lb_target_group" "app" { name = "app-target-group" port = 80 protocol = "HTTP" vpc_id = var.vpc_id
health_check { enabled = true healthy_threshold = 2 interval = 30 matcher = "200-299" path = "/health" port = "traffic-port" protocol = "HTTP" timeout = 10 unhealthy_threshold = 3 } } ```
Step 5: Fix Security Group Rules
```bash # Add inbound rule for health check from ALB aws ec2 authorize-security-group-ingress \ --group-id sg-target \ --protocol tcp \ --port 80 \ --source-group sg-alb
# Or allow from ALB subnet CIDR aws ec2 authorize-security-group-ingress \ --group-id sg-target \ --protocol tcp \ --port 80 \ --cidr 10.0.1.0/24 # ALB subnet CIDR ```
Step 6: Fix Application Health Endpoint
```bash # Ensure health endpoint returns correct response curl -v http://target-ip/health
# Expected: HTTP 200 OK # Fix application if it returns: # - 404 (endpoint doesn't exist) # - 301/302 (redirect instead of direct response) # - 500 (application error) # - Timeout (endpoint too slow) ```
# Example: Simple health endpoint
@app.route('/health')
def health():
return jsonify({'status': 'healthy'}), 200Step 7: Configure Proper Matcher
```bash # Accept broader status code range aws elbv2 modify-target-group \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --matcher HttpCode=200-399
# Or accept specific codes aws elbv2 modify-target-group \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --matcher HttpCode=200,201,202,204 ```
Step 8: Verify the Fix
bash
# Check target health after changes
aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \
--query 'TargetHealthDescriptions[?TargetHealth.State==healthy`]'
# Monitor CloudWatch metrics aws cloudwatch get-metric-statistics \ --namespace AWS/ApplicationELB \ --metric-name HealthyHostCount \ --dimensions Name=TargetGroup,Value=mytg Name=LoadBalancer,Value=my-alb \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Average
# Test through ALB curl -v http://my-alb.region.elb.amazonaws.com/ ```
Advanced Diagnosis
Check Health Check Logs
```bash # Enable ALB access logs aws elbv2 modify-load-balancer-attributes \ --load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/123456789 \ --attributes Key=access_logs.s3.enabled,Value=true Key=access_logs.s3.bucket,Value=my-bucket
# Query health check entries in logs aws s3 ls s3://my-bucket/AWSLogs/account/elasticloadbalancing/region/$(date +%Y/%m/%d)/
# Parse health check entries grep "ELB-HealthChecker" access_logs.csv | head -20 ```
Check Target Registration
```bash # Verify targets are properly registered aws elbv2 describe-targets \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789
# Check target port aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --query 'TargetHealthDescriptions[].Target.Port' ```
Use Lambda Health Check
# For complex health checks, use Lambda targets
aws elbv2 register-targets \
--target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \
--targets Id=arn:aws:lambda:region:account:function:health-checkCloudWatch Alarms
# Create alarm for unhealthy targets
aws cloudwatch put-metric-alarm \
--alarm-name ALB-Unhealthy-Targets \
--metric-name UnHealthyHostCount \
--namespace AWS/ApplicationELB \
--statistic Sum \
--period 60 \
--threshold 1 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=TargetGroup,Value=mytg \
--evaluation-periods 3 \
--alarm-actions arn:aws:sns:region:account:alertsCommon Pitfalls
- Health check returns redirect - 301/302 instead of 200 fails matcher
- Security group from wrong source - Allowing 0.0.0.0/0 but not ALB SG
- Wrong health check port - Port mismatch between config and target
- Too aggressive thresholds - single failure marks unhealthy
- Cross-zone issues - ALB in one zone, target in different zone
- Protocol mismatch - HTTP health check on HTTPS port
- Private DNS resolution - Target using private DNS not accessible
Best Practices
```hcl # Complete Terraform configuration resource "aws_lb_target_group" "app" { name = "app-tg" port = 80 protocol = "HTTP" vpc_id = var.vpc_id
health_check { enabled = true healthy_threshold = 2 interval = 30 matcher = "200-299" path = "/health" port = "traffic-port" protocol = "HTTP" timeout = 10 unhealthy_threshold = 3 }
deregistration_delay = 30
tags = { Name = "app-target-group" } }
# Security group for targets resource "aws_security_group_rule" "allow_alb_health_check" { type = "ingress" from_port = 80 to_port = 80 protocol = "tcp" source_security_group_id = aws_security_group.alb.id security_group_id = aws_security_group.app.id } ```
Related Issues
- AWS ALB Target Unhealthy
- HAProxy Health Check Failing
- GCP Load Balancer Backend Down
- Azure Load Balancer Health Probe