# Fix AWS CloudFormation Stack Rollback

Your CloudFormation stack just failed and rolled back, or worse—it's stuck in ROLLBACK_COMPLETE or UPDATE_ROLLBACK_FAILED. The error message is vague, and you're left wondering what went wrong and how to fix it.

CloudFormation rollbacks happen when a resource creation or update fails. By default, CloudFormation rolls back all changes to maintain consistency. But understanding *why* it failed and *how* to recover requires digging into the stack events and resources.

Diagnosis Commands

Start by checking the stack status:

bash
aws cloudformation describe-stacks \
  --stack-name my-stack \
  --query 'Stacks[*].[StackName,StackStatus,StackStatusReason]' \
  --output table

Get detailed events to find the failure:

bash
aws cloudformation describe-stack-events \
  --stack-name my-stack \
  --query 'StackEvents[?ResourceStatus==`CREATE_FAILED` || ResourceStatus==`UPDATE_FAILED`].[Timestamp,LogicalResourceId,ResourceType,ResourceStatusReason]' \
  --output table

The ResourceStatusReason usually tells you what went wrong. If it's truncated, get the full details:

bash
aws cloudformation describe-stack-events \
  --stack-name my-stack \
  --query 'StackEvents[?ResourceStatus==`CREATE_FAILED`]' \
  --output json | jq .

For nested stacks, you need to check each nested stack:

bash
aws cloudformation list-stack-resources \
  --stack-name my-stack \
  --query 'StackResourceSummaries[?ResourceType==`AWS::CloudFormation::Stack`].[LogicalResourceId,PhysicalResourceId]' \
  --output table

Then check events for each nested stack:

bash
aws cloudformation describe-stack-events \
  --stack-name nested-stack-name \
  --query 'StackEvents[?contains(ResourceStatus, `FAILED`)]'

Check for resources that might be in a weird state:

bash
aws cloudformation list-stack-resources \
  --stack-name my-stack \
  --query 'StackResourceSummaries[?contains(ResourceStatus, `FAILED`)]' \
  --output table

If you're in UPDATE_ROLLBACK_FAILED, identify what resource is stuck:

bash
aws cloudformation describe-stack-resources \
  --stack-name my-stack \
  --query 'StackResources[?ResourceStatus==`UPDATE_ROLLBACK_FAILED` || ResourceStatus==`UPDATE_FAILED`]'

Common Causes and Solutions

Resource Already Exists

CloudFormation tries to create a resource that already exists outside the stack:

bash
Resource creation failed: s3-bucket-name already exists in account

Solution: Either use a different resource name or import the existing resource:

bash
# For S3 buckets, first ensure the bucket exists and is empty or you want to import it
# Import an existing resource
aws cloudformation create-change-set \
  --stack-name my-stack \
  --change-set-name import-bucket \
  --change-set-type IMPORT \
  --resources-to-import '[{"ResourceType":"AWS::S3::Bucket","LogicalResourceId":"MyBucket","ResourceIdentifier":{"BucketName":"existing-bucket-name"}}]' \
  --template-body file://template.yaml

Or modify your template to use a unique name:

yaml
Resources:
  MyBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Sub "${AWS::StackName}-my-bucket-${AWS::AccountId}"

Insufficient IAM Permissions

CloudFormation or the service role lacks permissions to create resources:

bash
API: ec2:RunInstances User: arn:aws:sts::... is not authorized to perform: ec2:RunInstances

Check the stack's service role:

bash
aws cloudformation describe-stacks \
  --stack-name my-stack \
  --query 'Stacks[*].RoleARN'

If there's no service role, CloudFormation uses the permissions of the user/role that created the stack. Ensure that role has the necessary permissions:

```bash # Create a service role for CloudFormation aws iam create-role \ --role-name cloudformation-service-role \ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": {"Service": "cloudformation.amazonaws.com"}, "Action": "sts:AssumeRole" }] }'

# Attach necessary permissions aws iam attach-role-policy \ --role-name cloudformation-service-role \ --policy-arn arn:aws:iam::aws:policy/AdministratorAccess

# Update the stack to use the service role aws cloudformation update-stack \ --stack-name my-stack \ --template-body file://template.yaml \ --role-arn arn:aws:iam::123456789012:role/cloudformation-service-role \ --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM ```

Resource Dependency Failure

A resource depends on another resource that failed to create:

bash
Resource creation cancelled

This message means the resource wasn't even attempted because a dependency failed. Find the root cause:

bash
aws cloudformation describe-stack-events \
  --stack-name my-stack \
  --query 'sort_by(StackEvents, &Timestamp)[?ResourceStatus==`CREATE_FAILED`]' \
  --output json | jq '.[-5:]'  # Last 5 failed events

Fix the root cause resource first, then retry the stack.

Timeout During Creation

Some resources take longer than the default timeout:

bash
Resource creation timed out

For custom resources, you can specify a timeout:

yaml
Resources:
  MyCustomResource:
    Type: Custom::MyResource
    Properties:
      ServiceToken: !GetAtt MyFunction.Arn
      Timeout: 600  # 10 minutes

For AWS resources, the timeout is fixed. Consider using a larger instance type or pre-creating dependent resources.

Invalid Property Values

Template has invalid property values:

bash
Encountered unsupported property Name

Or:

bash
Property validation failure: Value 'invalid-value' for property 'Property'

Validate your template before deploying:

bash
aws cloudformation validate-template \
  --template-body file://template.yaml

Use the AWS documentation to find valid values. For example, EC2 instance types must be valid:

bash
aws ec2 describe-instance-types \
  --filters Name=instance-type,Values=t3.* \
  --query 'InstanceTypes[*].InstanceType'

VPC or Subnet Issues

Resources fail to create in the specified VPC/subnet:

bash
The subnet ID 'subnet-12345' does not exist

Or:

bash
VPC vpc-12345 has no internet gateway

Verify the VPC and subnet exist and are correct:

```bash # Check VPC exists aws ec2 describe-vpcs \ --vpc-ids vpc-12345

# Check subnet exists and is in correct VPC aws ec2 describe-subnets \ --subnet-ids subnet-12345 \ --query 'Subnets[*].[SubnetId,VpcId,AvailabilityZone,CidrBlock]'

# Check internet gateway aws ec2 describe-internet-gateways \ --filters Name=attachment.vpc-id,Values=vpc-12345 ```

Update Rollback Failed

When a stack is in UPDATE_ROLLBACK_FAILED, it means the rollback itself failed:

bash
aws cloudformation describe-stacks \
  --stack-name my-stack \
  --query 'Stacks[*].[StackName,StackStatus,StackStatusReason]'

You need to continue the rollback:

bash
aws cloudformation continue-update-rollback \
  --stack-name my-stack

If that fails, you might need to skip specific resources:

bash
aws cloudformation continue-update-rollback \
  --stack-name my-stack \
  --resources-to-skip MyProblematicResource

Recovery Strategies

Delete and Recreate

If the stack is in ROLLBACK_COMPLETE, you can delete it and start over:

```bash aws cloudformation delete-stack --stack-name my-stack

# Wait for deletion aws cloudformation wait stack-delete-complete --stack-name my-stack

# Recreate aws cloudformation create-stack \ --stack-name my-stack \ --template-body file://template.yaml \ --parameters file://parameters.json \ --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM ```

Retain Resources on Failure

For development, configure resources to be retained on failure:

yaml
Resources:
  MyDatabase:
    Type: AWS::RDS::DBInstance
    DeletionPolicy: Retain
    UpdateReplacePolicy: Retain
    Properties:
      # ...

This allows you to fix issues without losing data.

Use Change Sets

Before updating, preview changes:

```bash aws cloudformation create-change-set \ --stack-name my-stack \ --change-set-name preview-changes \ --template-body file://template.yaml \ --parameters file://parameters.json

aws cloudformation describe-change-set \ --stack-name my-stack \ --change-set-name preview-changes \ --query 'Changes[*].ResourceChange.{Action:Action,Resource:LogicalResourceId,Type:ResourceType}' ```

Execute only if the changes look correct:

bash
aws cloudformation execute-change-set \
  --stack-name my-stack \
  --change-set-name preview-changes

Debugging Techniques

Enable Termination Protection

Prevent accidental deletion:

bash
aws cloudformation update-termination-protection \
  --stack-name my-stack \
  --enable-termination-protection

Use Stack Policies

Prevent updates to critical resources:

bash
aws cloudformation set-stack-policy \
  --stack-name my-stack \
  --stack-policy-body '{
    "Statement": [{
      "Effect": "Deny",
      "Principal": "*",
      "Action": "Update:*",
      "Resource": "LogicalResourceId/MyDatabase"
    }]
  }'

Detailed Event Log

Get all events in a readable format:

bash
aws cloudformation describe-stack-events \
  --stack-name my-stack \
  --output json | jq -r '.StackEvents[] | "\(.Timestamp) [\(.ResourceStatus)] \(.LogicalResourceId) (\(.ResourceType)): \(.ResourceStatusReason // "N/A")"'

Drift Detection

Check if resources have been modified outside CloudFormation:

```bash aws cloudformation detect-stack-drift \ --stack-name my-stack

# Wait and get results aws cloudformation describe-stack-drift-detection-status \ --stack-name my-stack

# View drifted resources aws cloudformation describe-stack-resource-drifts \ --stack-name my-stack \ --query 'StackResourceDrifts[?StackResourceDriftStatus==MODIFIED]' ```

Verification Steps

After fixing and updating:

```bash # Wait for stack to complete aws cloudformation wait stack-create-complete --stack-name my-stack # or aws cloudformation wait stack-update-complete --stack-name my-stack

# Verify stack outputs aws cloudformation describe-stacks \ --stack-name my-stack \ --query 'Stacks[*].Outputs'

# List all resources aws cloudformation list-stack-resources \ --stack-name my-stack \ --query 'StackResourceSummaries[*].[LogicalResourceId,PhysicalResourceId,ResourceStatus]' ```

Set up drift detection on a schedule:

```bash aws events put-rule \ --name daily-drift-check \ --schedule-expression "rate(1 day)"

aws events put-targets \ --rule daily-drift-check \ --targets '{"Id":"1","Arn":"arn:aws:cloudformation:us-east-1:123456789012:stack/my-stack","Input":"{}"}' ```