# Fix AWS CloudFormation Stack Rollback
Your CloudFormation stack just failed and rolled back, or worse—it's stuck in ROLLBACK_COMPLETE or UPDATE_ROLLBACK_FAILED. The error message is vague, and you're left wondering what went wrong and how to fix it.
CloudFormation rollbacks happen when a resource creation or update fails. By default, CloudFormation rolls back all changes to maintain consistency. But understanding *why* it failed and *how* to recover requires digging into the stack events and resources.
Diagnosis Commands
Start by checking the stack status:
aws cloudformation describe-stacks \
--stack-name my-stack \
--query 'Stacks[*].[StackName,StackStatus,StackStatusReason]' \
--output tableGet detailed events to find the failure:
aws cloudformation describe-stack-events \
--stack-name my-stack \
--query 'StackEvents[?ResourceStatus==`CREATE_FAILED` || ResourceStatus==`UPDATE_FAILED`].[Timestamp,LogicalResourceId,ResourceType,ResourceStatusReason]' \
--output tableThe ResourceStatusReason usually tells you what went wrong. If it's truncated, get the full details:
aws cloudformation describe-stack-events \
--stack-name my-stack \
--query 'StackEvents[?ResourceStatus==`CREATE_FAILED`]' \
--output json | jq .For nested stacks, you need to check each nested stack:
aws cloudformation list-stack-resources \
--stack-name my-stack \
--query 'StackResourceSummaries[?ResourceType==`AWS::CloudFormation::Stack`].[LogicalResourceId,PhysicalResourceId]' \
--output tableThen check events for each nested stack:
aws cloudformation describe-stack-events \
--stack-name nested-stack-name \
--query 'StackEvents[?contains(ResourceStatus, `FAILED`)]'Check for resources that might be in a weird state:
aws cloudformation list-stack-resources \
--stack-name my-stack \
--query 'StackResourceSummaries[?contains(ResourceStatus, `FAILED`)]' \
--output tableIf you're in UPDATE_ROLLBACK_FAILED, identify what resource is stuck:
aws cloudformation describe-stack-resources \
--stack-name my-stack \
--query 'StackResources[?ResourceStatus==`UPDATE_ROLLBACK_FAILED` || ResourceStatus==`UPDATE_FAILED`]'Common Causes and Solutions
Resource Already Exists
CloudFormation tries to create a resource that already exists outside the stack:
Resource creation failed: s3-bucket-name already exists in accountSolution: Either use a different resource name or import the existing resource:
# For S3 buckets, first ensure the bucket exists and is empty or you want to import it
# Import an existing resource
aws cloudformation create-change-set \
--stack-name my-stack \
--change-set-name import-bucket \
--change-set-type IMPORT \
--resources-to-import '[{"ResourceType":"AWS::S3::Bucket","LogicalResourceId":"MyBucket","ResourceIdentifier":{"BucketName":"existing-bucket-name"}}]' \
--template-body file://template.yamlOr modify your template to use a unique name:
Resources:
MyBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub "${AWS::StackName}-my-bucket-${AWS::AccountId}"Insufficient IAM Permissions
CloudFormation or the service role lacks permissions to create resources:
API: ec2:RunInstances User: arn:aws:sts::... is not authorized to perform: ec2:RunInstancesCheck the stack's service role:
aws cloudformation describe-stacks \
--stack-name my-stack \
--query 'Stacks[*].RoleARN'If there's no service role, CloudFormation uses the permissions of the user/role that created the stack. Ensure that role has the necessary permissions:
```bash # Create a service role for CloudFormation aws iam create-role \ --role-name cloudformation-service-role \ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": {"Service": "cloudformation.amazonaws.com"}, "Action": "sts:AssumeRole" }] }'
# Attach necessary permissions aws iam attach-role-policy \ --role-name cloudformation-service-role \ --policy-arn arn:aws:iam::aws:policy/AdministratorAccess
# Update the stack to use the service role aws cloudformation update-stack \ --stack-name my-stack \ --template-body file://template.yaml \ --role-arn arn:aws:iam::123456789012:role/cloudformation-service-role \ --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM ```
Resource Dependency Failure
A resource depends on another resource that failed to create:
Resource creation cancelledThis message means the resource wasn't even attempted because a dependency failed. Find the root cause:
aws cloudformation describe-stack-events \
--stack-name my-stack \
--query 'sort_by(StackEvents, &Timestamp)[?ResourceStatus==`CREATE_FAILED`]' \
--output json | jq '.[-5:]' # Last 5 failed eventsFix the root cause resource first, then retry the stack.
Timeout During Creation
Some resources take longer than the default timeout:
Resource creation timed outFor custom resources, you can specify a timeout:
Resources:
MyCustomResource:
Type: Custom::MyResource
Properties:
ServiceToken: !GetAtt MyFunction.Arn
Timeout: 600 # 10 minutesFor AWS resources, the timeout is fixed. Consider using a larger instance type or pre-creating dependent resources.
Invalid Property Values
Template has invalid property values:
Encountered unsupported property NameOr:
Property validation failure: Value 'invalid-value' for property 'Property'Validate your template before deploying:
aws cloudformation validate-template \
--template-body file://template.yamlUse the AWS documentation to find valid values. For example, EC2 instance types must be valid:
aws ec2 describe-instance-types \
--filters Name=instance-type,Values=t3.* \
--query 'InstanceTypes[*].InstanceType'VPC or Subnet Issues
Resources fail to create in the specified VPC/subnet:
The subnet ID 'subnet-12345' does not existOr:
VPC vpc-12345 has no internet gatewayVerify the VPC and subnet exist and are correct:
```bash # Check VPC exists aws ec2 describe-vpcs \ --vpc-ids vpc-12345
# Check subnet exists and is in correct VPC aws ec2 describe-subnets \ --subnet-ids subnet-12345 \ --query 'Subnets[*].[SubnetId,VpcId,AvailabilityZone,CidrBlock]'
# Check internet gateway aws ec2 describe-internet-gateways \ --filters Name=attachment.vpc-id,Values=vpc-12345 ```
Update Rollback Failed
When a stack is in UPDATE_ROLLBACK_FAILED, it means the rollback itself failed:
aws cloudformation describe-stacks \
--stack-name my-stack \
--query 'Stacks[*].[StackName,StackStatus,StackStatusReason]'You need to continue the rollback:
aws cloudformation continue-update-rollback \
--stack-name my-stackIf that fails, you might need to skip specific resources:
aws cloudformation continue-update-rollback \
--stack-name my-stack \
--resources-to-skip MyProblematicResourceRecovery Strategies
Delete and Recreate
If the stack is in ROLLBACK_COMPLETE, you can delete it and start over:
```bash aws cloudformation delete-stack --stack-name my-stack
# Wait for deletion aws cloudformation wait stack-delete-complete --stack-name my-stack
# Recreate aws cloudformation create-stack \ --stack-name my-stack \ --template-body file://template.yaml \ --parameters file://parameters.json \ --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM ```
Retain Resources on Failure
For development, configure resources to be retained on failure:
Resources:
MyDatabase:
Type: AWS::RDS::DBInstance
DeletionPolicy: Retain
UpdateReplacePolicy: Retain
Properties:
# ...This allows you to fix issues without losing data.
Use Change Sets
Before updating, preview changes:
```bash aws cloudformation create-change-set \ --stack-name my-stack \ --change-set-name preview-changes \ --template-body file://template.yaml \ --parameters file://parameters.json
aws cloudformation describe-change-set \ --stack-name my-stack \ --change-set-name preview-changes \ --query 'Changes[*].ResourceChange.{Action:Action,Resource:LogicalResourceId,Type:ResourceType}' ```
Execute only if the changes look correct:
aws cloudformation execute-change-set \
--stack-name my-stack \
--change-set-name preview-changesDebugging Techniques
Enable Termination Protection
Prevent accidental deletion:
aws cloudformation update-termination-protection \
--stack-name my-stack \
--enable-termination-protectionUse Stack Policies
Prevent updates to critical resources:
aws cloudformation set-stack-policy \
--stack-name my-stack \
--stack-policy-body '{
"Statement": [{
"Effect": "Deny",
"Principal": "*",
"Action": "Update:*",
"Resource": "LogicalResourceId/MyDatabase"
}]
}'Detailed Event Log
Get all events in a readable format:
aws cloudformation describe-stack-events \
--stack-name my-stack \
--output json | jq -r '.StackEvents[] | "\(.Timestamp) [\(.ResourceStatus)] \(.LogicalResourceId) (\(.ResourceType)): \(.ResourceStatusReason // "N/A")"'Drift Detection
Check if resources have been modified outside CloudFormation:
```bash aws cloudformation detect-stack-drift \ --stack-name my-stack
# Wait and get results aws cloudformation describe-stack-drift-detection-status \ --stack-name my-stack
# View drifted resources
aws cloudformation describe-stack-resource-drifts \
--stack-name my-stack \
--query 'StackResourceDrifts[?StackResourceDriftStatus==MODIFIED]'
```
Verification Steps
After fixing and updating:
```bash # Wait for stack to complete aws cloudformation wait stack-create-complete --stack-name my-stack # or aws cloudformation wait stack-update-complete --stack-name my-stack
# Verify stack outputs aws cloudformation describe-stacks \ --stack-name my-stack \ --query 'Stacks[*].Outputs'
# List all resources aws cloudformation list-stack-resources \ --stack-name my-stack \ --query 'StackResourceSummaries[*].[LogicalResourceId,PhysicalResourceId,ResourceStatus]' ```
Set up drift detection on a schedule:
```bash aws events put-rule \ --name daily-drift-check \ --schedule-expression "rate(1 day)"
aws events put-targets \ --rule daily-drift-check \ --targets '{"Id":"1","Arn":"arn:aws:cloudformation:us-east-1:123456789012:stack/my-stack","Input":"{}"}' ```