Introduction When an Amazon ECS task remains stuck in the PENDING state and never transitions to RUNNING, your service becomes unavailable. This is one of the most common ECS issues in production, often caused by resource constraints, missing IAM permissions, or VPC misconfigurations.

Symptoms - ECS task status stays at PENDING for more than 5 minutes - No containers appear in docker ps on the container instance - CloudWatch Events show TaskStateChange to PENDING but no RUNNING event - Service shows desired count > running count - No application logs in CloudWatch Logs

Common Causes - Insufficient CPU or memory resources on the container instance - Missing ecsTaskExecutionRole IAM permissions - Security group blocking ECR pull or Secrets Manager access - Container image pull failure (invalid URI, missing ECR permissions) - Subnet has no route to the internet (missing NAT gateway for public ECR)

Step-by-Step Fix 1. **Check the stopped reason code**: ```bash aws ecs describe-tasks --cluster my-cluster --tasks <task-id> \ --query 'tasks[0].stoppedReason' ``` Look for "ResourceInitializationError" or "Essential container exited".

  1. 1.Verify container instance resources:
  2. 2.```bash
  3. 3.aws ecs describe-container-instances --cluster my-cluster \
  4. 4.--container-instances <instance-id> \
  5. 5.--query 'containerInstances[0].remainingResources'
  6. 6.`
  7. 7.Ensure enough CPU units and memory MB are available.
  8. 8.Check the ecs-agent logs:
  9. 9.```bash
  10. 10.sudo journalctl -u ecs -f --since "10 minutes ago"
  11. 11.`
  12. 12.Look for "Unable to pull image" or "AccessDenied" errors.
  13. 13.Verify the task execution role:
  14. 14.```bash
  15. 15.aws iam get-role --role-name ecsTaskExecutionRole
  16. 16.aws iam list-attached-role-policies --role-name ecsTaskExecutionRole
  17. 17.`
  18. 18.The role must have AmazonECSTaskExecutionRolePolicy attached.
  19. 19.Test ECR image pull manually:
  20. 20.```bash
  21. 21.aws ecr get-login-password --region us-east-1 | \
  22. 22.docker login --username AWS --password-stdin <account>.dkr.ecr.us-east-1.amazonaws.com
  23. 23.docker pull <account>.dkr.ecr.us-east-1.amazonaws.com/my-app:latest
  24. 24.`
  25. 25.Verify VPC networking for private subnets:
  26. 26.```bash
  27. 27.aws ec2 describe-route-tables --route-table-ids <rtb-id> \
  28. 28.--query 'RouteTables[0].Routes[?DestinationCidrBlock==0.0.0.0/0]'
  29. 29.`
  30. 30.Private subnets need a NAT gateway route to pull from public ECR.

Prevention - Set up CloudWatch alarms for ECS service running count dropping below desired - Use Fargate to eliminate container instance resource contention - Add deployment circuit breakers: `aws ecs update-service --deployment-configuration deploymentCircuitBreaker.enabled=true` - Pin container image tags instead of using :latest - Use VPC endpoints for ECR (com.amazonaws.region.ecr.api) to avoid NAT dependency