# AWS ECS Task Stopped
Common Error Patterns
ECS task failures typically show:
Task stopped at: 2024-01-15T10:30:00Z. Reason: Essential container in task exitedCannotPullContainerError: inspect image has been retried 5 time(s)ResourceInitializationError: failed to validate logger argsTask failed ELB health checks in (target-group arn:aws:elasticloadbalancing:...)Root Causes and Solutions
1. Container Exit Code Analysis
Check the container exit code to understand failure:
aws ecs describe-tasks \
--cluster my-cluster \
--tasks arn:aws:ecs:us-east-1:123456789012:task/my-task \
--query 'tasks[0].containers[*].[name,exitCode,reason]'| Exit Code | Meaning | Common Cause |
|---|---|---|
| 0 | Success | Application completed intentionally |
| 1 | General error | Application error |
| 137 | SIGKILL | Out of memory or manual kill |
| 139 | Segmentation fault | Application bug |
| 255 | Exit status out of range | Application error |
Solution for Exit Code 137:
Increase task memory:
aws ecs register-task-definition \
--family my-task \
--memory 1024 \
--container-definitions '[
{
"name": "my-container",
"image": "my-image",
"memory": 512,
"memoryReservation": 256
}
]'2. Image Pull Failures
ECS cannot pull the container image.
Solution:
Check image exists and is accessible:
```bash # Verify image exists docker pull my-registry/my-image:latest
# For ECR, ensure permissions aws ecr describe-images \ --repository-name my-repo \ --image-ids imageTag=latest ```
Ensure task execution role has ECR permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "logs:CreateLogStream",
"Resource": "*"
}
]
}3. Health Check Failures
Container fails health checks and is stopped.
Solution:
Review health check configuration:
aws ecs describe-services \
--cluster my-cluster \
--services my-service \
--query 'services[0].deployments[0].healthCheckGracePeriodSeconds'Increase health check grace period:
aws ecs update-service \
--cluster my-cluster \
--service my-service \
--health-check-grace-period-seconds 300Verify container health check:
{
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
}4. Resource Constraints
Insufficient CPU or memory for the task.
Solution:
Check service events:
aws ecs describe-services \
--cluster my-cluster \
--services my-service \
--query 'services[0].events[:5]'Increase task resources:
aws ecs register-task-definition \
--family my-task \
--cpu 512 \
--memory 1024 \
--requires-compatibilities FARGATE5. Environment Variable Issues
Missing or incorrect environment variables.
Solution:
Check task definition environment:
aws ecs describe-task-definition \
--task-definition my-task \
--query 'taskDefinition.containerDefinitions[0].environment'Use Secrets Manager or Parameter Store for sensitive values:
{
"secrets": [
{
"name": "DATABASE_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:my-secret"
}
]
}6. Network Configuration Issues
Task cannot communicate with required services.
Solution:
Verify VPC configuration:
aws ecs describe-services \
--cluster my-cluster \
--services my-service \
--query 'services[0].networkConfiguration'Check security groups allow required traffic:
aws ec2 describe-security-groups \
--group-ids sg-0123456789abcdef0Ensure tasks can reach: - Container registry (ECR, Docker Hub) - External services (APIs, databases) - Internal services (ALB, other services)
7. Task Definition Issues
Invalid or outdated task definition.
Solution:
Validate task definition:
aws ecs describe-task-definition \
--task-definition my-task:1 \
--query 'taskDefinition'Common issues: - Invalid log configuration - Missing essential flag on container - Invalid port mappings - Incorrect CPU/memory ratio
Debugging Commands
```bash # Get task details aws ecs describe-tasks --cluster my-cluster --tasks my-task-id
# View task logs aws logs get-log-events \ --log-group-name /ecs/my-task \ --log-stream-name ecs/my-container/my-task-id
# List stopped tasks aws ecs list-tasks \ --cluster my-cluster \ --desired-status STOPPED
# Get task definition aws ecs describe-task-definition --task-definition my-task
# Check service events aws ecs describe-services --cluster my-cluster --services my-service ```
Fargate-Specific Issues
Platform Version Issues
aws ecs update-service \
--cluster my-cluster \
--service my-service \
--platform-version LATESTSubnet Configuration
Ensure tasks have: - Public subnet with NAT Gateway (for public images) - Private subnet with VPC endpoints (for ECR) - Security groups allowing required traffic
ECS Exec Debugging
Enable ECS Exec for interactive debugging:
```bash # Enable ECS Exec aws ecs update-service \ --cluster my-cluster \ --service my-service \ --enable-execute-command
# Connect to container aws ecs execute-command \ --cluster my-cluster \ --task my-task-id \ --container my-container \ --command "/bin/bash" \ --interactive ```
Prevention Tips
- 1.Set up CloudWatch alarms for task failures
- 2.Use health checks with appropriate grace periods
- 3.Configure proper resource limits
- 4.Implement circuit breakers in code
- 5.Use X-Ray for distributed tracing
Quick Reference
| Issue | Command |
|---|---|
| View task status | aws ecs describe-tasks |
| Check exit codes | --query 'tasks[0].containers[*].exitCode' |
| View service events | aws ecs describe-services |
| Check logs | aws logs get-log-events |
| List stopped tasks | aws ecs list-tasks --desired-status STOPPED |
Related Articles
- [AWS Lambda Timeout](#)
- [AWS EC2 Instance Not Reachable](#)
- [Docker Build Failed in CI](#)