Introduction
AWS ALB targets can remain unhealthy due to registration issues, failed health checks, or improper deregistration handling. Unlike basic health check failures, target unhealthy status may persist even after fixing the underlying application issue. This guide addresses registration timing, deregistration delays, and configuration problems that cause targets to stay unhealthy.
Symptoms
Error messages in AWS console:
Target is in unhealthy state
Target.RegistrationPending: Target is being registered
Target.DeregistrationInProgress: Target is being deregistered
Target.HealthCheckFailed: Health checks on this target are failing
Target.Disabled: Target is disabled by the load balancerObservable indicators: - Targets stuck in "draining" or "unhealthy" state - New instances not receiving traffic after registration - Old instances still receiving traffic after deregistration - CloudWatch shows connection errors during deployments - Rolling deployments causing service disruption
Common Causes
- 1.Deregistration delay too long - Connections draining slowly
- 2.Registration timing issues - Health checks starting before app ready
- 3.Target in wrong Availability Zone - Cross-zone routing disabled
- 4.Multiple target groups - Target registered incorrectly
- 5.Instance initialization delay - App not ready when health checks begin
- 6.Connection persistence - Existing connections not closing gracefully
- 7.Auto Scaling termination - Instance terminated before drained
Step-by-Step Fix
Step 1: Check Target Registration Status
```bash # Check detailed target health aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --query 'TargetHealthDescriptions[].{Target:Target.Id,Port:Target.Port,State:TargetHealth.State,Reason:TargetHealth.Reason,Description:TargetHealth.Description}'
# Check for draining targets
aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \
--query 'TargetHealthDescriptions[?TargetHealth.State==draining]'
# Check registration pending
aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \
--query 'TargetHealthDescriptions[?TargetHealth.State==initial]'
# List all registered targets aws elbv2 describe-targets \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 ```
Step 2: Check Deregistration Delay Configuration
```bash # Get target group attributes aws elbv2 describe-target-group-attributes \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789
# Check deregistration_delay.timeout_seconds # Default is 300 seconds (5 minutes) ```
# Modify deregistration delay
aws elbv2 modify-target-group-attributes \
--target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \
--attributes Key=deregistration_delay.timeout_seconds,Value=30Step 3: Fix Registration Timing
```bash # Wait for application to be ready before registering # Use a startup grace period
# Option 1: Register after app is healthy aws elbv2 register-targets \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --targets Id=i-new-instance,Port=80
# Option 2: Use connection draining during registration # Configure in Terraform ```
```hcl resource "aws_lb_target_group" "app" { name = "app-tg" port = 80 protocol = "HTTP" vpc_id = var.vpc_id
health_check { enabled = true healthy_threshold = 2 interval = 30 matcher = "200-299" path = "/health" timeout = 10 unhealthy_threshold = 5 # Higher threshold for startup }
deregistration_delay = 30 # Shorter delay for faster deployment
# Slow start mode for new targets slow_start = 30 # 30 seconds to ramp up traffic } ```
Step 4: Force Target Deregistration
```bash # Force deregister stuck targets aws elbv2 deregister-targets \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --targets Id=i-old-instance,Port=80
# Verify deregistration started aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --targets Id=i-old-instance ```
Step 5: Fix Cross-Zone Load Balancing
bash
# Check cross-zone load balancing attribute
aws elbv2 describe-load-balancer-attributes \
--load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/123456789 \
--query 'Attributes[?Key==load_balancing.algorithm.type`]'
# Enable cross-zone for better distribution aws elbv2 modify-load-balancer-attributes \ --load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/123456789 \ --attributes Key=load_balancing.algorithm.type,Value=round_robin Key=load_balancing.cross_zone.enabled,Value=true ```
Step 6: Handle Auto Scaling Lifecycle
```bash # Add lifecycle hook for graceful deregistration aws autoscaling put-lifecycle-hook \ --auto-scaling-group-name my-asg \ --lifecycle-hook-name alb-draining-hook \ --lifecycle-transition autoscaling:EC2_INSTANCE_TERMINATING \ --default-result CONTINUE \ --heartbeat-timeout 300 \ --notification-target-arn arn:aws:sns:region:account:asg-notifications \ --role-arn arn:aws:iam::account:role/asg-lifecycle-role
# Lambda function to deregister from ALB # Triggered by SNS notification ```
```python # Lambda for graceful deregistration import boto3
def lambda_handler(event, context): elbv2 = boto3.client('elbv2') autoscaling = boto3.client('autoscaling')
instance_id = event['EC2InstanceId'] asg_name = event['AutoScalingGroupName']
# Find target groups for this ASG target_groups = get_target_groups_for_asg(asg_name)
# Deregister instance for tg_arn in target_groups: elbv2.deregister_targets( TargetGroupArn=tg_arn, Targets=[{'Id': instance_id}] )
# Wait for deregistration wait_for_deregistration(target_groups, instance_id)
# Complete lifecycle action autoscaling.complete_lifecycle_action( LifecycleHookName='alb-draining-hook', AutoScalingGroupName=asg_name, LifecycleActionResult='CONTINUE', InstanceId=instance_id ) ```
Step 7: Monitor Connection Counts
```bash # Check active connections via CloudWatch aws cloudwatch get-metric-statistics \ --namespace AWS/ApplicationELB \ --metric-name ActiveConnectionCount \ --dimensions Name=LoadBalancer,Value=my-alb \ --start-time $(date -u -d '30 minutes ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Sum
# Check target-level connection metrics aws cloudwatch get-metric-statistics \ --namespace AWS/ApplicationELB \ --metric-name TargetConnectionCount \ --dimensions Name=TargetGroup,Value=mytg Name=TargetId,Value=i-1234567890 \ --period 60 \ --statistics Sum ```
Step 8: Verify and Test Deployment
```bash # Test deployment with new instance aws elbv2 register-targets \ --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 \ --targets Id=i-new,Port=80
# Monitor health status change watch -n 5 'aws elbv2 describe-target-health --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 --query TargetHealthDescriptions[].{Id:Target.Id,State:TargetHealth.State}'
# Verify traffic routing curl -v http://my-alb.region.elb.amazonaws.com/health ```
Advanced Diagnosis
Check ASG Attachment
```bash # Verify ASG attached to target group aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-names my-asg \ --query 'AutoScalingGroups[0].TargetGroupARNs'
# Attach target group to ASG aws autoscaling attach-load-balancer-target-groups \ --auto-scaling-group-name my-asg \ --target-group-arns arn:aws:elasticloadbalancing:region:account:targetgroup/mytg/123456789 ```
Debug Registration Flow
```bash # Enable ELB access logs aws elbv2 modify-load-balancer-attributes \ --load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/123456789 \ --attributes Key=access_logs.s3.enabled,Value=true Key=access_logs.s3.bucket,Value=log-bucket
# Check registration events in CloudTrail aws cloudtrail lookup-events \ --lookup-attributes AttributeKey=ResourceName,AttributeValue=mytg \ --event-name RegisterTargets ```
Multiple Target Groups
# Check all target groups a target is registered to
for tg in $(aws elbv2 describe-target-groups --query 'TargetGroups[].TargetGroupArn' --output text); do
echo "=== $tg ==="
aws elbv2 describe-target-health --target-group-arn $tg --query 'TargetHealthDescriptions[?Target.Id==`i-1234567890`]'
doneCommon Pitfalls
- Deregistration delay too high - Stuck targets for minutes after deployment
- No slow start mode - New targets overwhelmed immediately
- Missing lifecycle hook - ASG terminates before connections drain
- Cross-zone disabled - Targets in one AZ not reachable from ALB nodes
- Registration before app ready - Health checks fail during startup
- Health check threshold too aggressive - Single failure marks unhealthy
- Port mismatch in registration - Wrong port registered
Best Practices
```hcl # Complete deployment configuration resource "aws_lb_target_group" "app" { name = "app-tg" port = 80 protocol = "HTTP" vpc_id = var.vpc_id target_type = "instance"
health_check { enabled = true healthy_threshold = 2 interval = 30 matcher = "200-299" path = "/health" timeout = 10 unhealthy_threshold = 3 }
deregistration_delay = 30 slow_start = 60 # Gradual traffic ramp-up
stickiness { enabled = true type = "lb_cookie" duration_seconds = 86400 } }
resource "aws_autoscaling_lifecycle_hook" "draining" { name = "alb-draining" auto_scaling_group_name = aws_autoscaling_group.app.name lifecycle_transition = "autoscaling:EC2_INSTANCE_TERMINATING" default_result = "CONTINUE" heartbeat_timeout = 300 notification_target_arn = aws_sns_topic.asg_notifications.arn role_arn = aws_iam_role.lifecycle.arn }
resource "aws_lb" "app" { name = "app-alb" internal = false load_balancer_type = "application" security_groups = [aws_security_group.alb.id] subnets = aws_subnet.public[*].id
enable_cross_zone_load_balancing = true } ```
Related Issues
- AWS ALB Health Check Failing
- HAProxy Backend Down
- GCP Load Balancer Backend Down
- Azure Load Balancer Health Probe