Introduction
When an ECR lifecycle policy deletes images that are still referenced by ECS task definitions, Kubernetes deployments, or Docker Compose files, subsequent image pulls fail with ImagePullBackOff or ManifestUnknown errors. This commonly happens when lifecycle rules are too aggressive with the "untagged" or "image count more than N" settings.
Symptoms
- ECS task fails with:
CannotPullContainerError: failed to resolve reference "account.dkr.ecr.region.amazonaws.com/repo:tag": manifest unknown - Kubernetes pod shows
ErrImagePullwithmanifest unknownorimage not found - ECR describe-images returns no results for the expected tag
- CloudTrail shows
BatchDeleteImageevents around the time of the failure
Common Causes
- Lifecycle rule with "image count more than 10" deletes tagged images when tag is not excluded
- "Untagged" image rule removes images that lost their tag during redeployment
- Multiple lifecycle rules with conflicting priorities
- CI/CD pipeline reuses tags (e.g., "latest") causing older images to become untagged
- No "tagged" image exclusion on count-based rules
Step-by-Step Fix
- 1.Identify deleted images in CloudTrail:
- 2.```bash
- 3.aws cloudtrail lookup-events \
- 4.--lookup-attributes AttributeKey=EventName,AttributeValue=BatchDeleteImage \
- 5.--start-time $(date -d '24 hours ago' +%s) \
- 6.--query 'Events[*].{Time:EventTime,User:Username}'
- 7.
` - 8.Check current lifecycle policy:
- 9.```bash
- 10.aws ecr get-lifecycle-policy --repository-name my-repo
- 11.
` - 12.Look for rules with
"selection": {"tagStatus": "untagged"}or"countType": "imageCountMoreThan"without tag protection. - 13.Update lifecycle policy to protect tagged images:
- 14.```bash
- 15.aws ecr put-lifecycle-policy \
- 16.--repository-name my-repo \
- 17.--lifecycle-policy-text '{
- 18."rules": [
- 19.{
- 20."rulePriority": 1,
- 21."description": "Keep last 20 tagged images",
- 22."selection": {
- 23."tagStatus": "tagged",
- 24."tagPrefixList": ["prod", "staging"],
- 25."countType": "imageCountMoreThan",
- 26."countNumber": 20
- 27.},
- 28."action": {"type": "expire"}
- 29.},
- 30.{
- 31."rulePriority": 2,
- 32."description": "Delete untagged images older than 7 days",
- 33."selection": {
- 34."tagStatus": "untagged",
- 35."countType": "sinceImagePushed",
- 36."countUnit": "days",
- 37."countNumber": 7
- 38.},
- 39."action": {"type": "expire"}
- 40.}
- 41.]
- 42.}'
- 43.
` - 44.Rebuild and push the missing image:
- 45.```bash
- 46.docker build -t account.dkr.ecr.region.amazonaws.com/repo:tag .
- 47.aws ecr get-login-password | docker login --username AWS --password-stdin account.dkr.ecr.region.amazonaws.com
- 48.docker push account.dkr.ecr.region.amazonaws.com/repo:tag
- 49.
` - 50.Restart affected services:
- 51.```bash
- 52.aws ecs update-service --cluster my-cluster --service my-service --force-new-deployment
- 53.
`
Prevention
- Always use immutable tags (include commit SHA or build number)
- Add tagPrefixList to protect production and staging image tags
- Set minimum image count thresholds above your rollback window needs
- Enable ECR repository scanning for lifecycle policy changes
- Use imageDigest in task definitions instead of mutable tags