Fix AWS Unexpected High Costs | Cost Investigation Guide

# Fix AWS Unexpected High Costs Investigation

You log into AWS and see a billing alert—your monthly costs have spiked unexpectedly. Maybe it's 2x, 5x, or even 10x your normal bill. Before you panic, you need to systematically investigate where the money is going. AWS provides multiple tools for cost analysis, but navigating them requires knowing where to look.

This guide covers investigating cost spikes, identifying culprits, and implementing fixes.

Diagnosis Commands

First, get an overview of your costs:

bash

aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE \
  --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]'

Get daily breakdown to find when spike started:

bash

aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity DAILY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE \
  --query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]'

Compare with previous month:

bash

aws ce get-cost-and-usage \
  --time-period Start=2026-02-01,End=2026-03-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --query 'ResultsByTime[*].Total.BlendedCost.Amount'

Get cost by region:

bash

aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=REGION \
  --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]'

Get cost by usage type (for identifying specific charges):

bash

aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=USAGE_TYPE \
  --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]' \
  --output table | sort -k2 -n -r | head -20

Get cost by resource (requires AWS Cost Explorer API):

bash

aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity DAILY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=RESOURCE_ID \
  --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]'

Check for reserved instance savings plans:

```bash aws ce get-reservation-coverage \ --time-period Start=2026-03-01,End=2026-04-01 \ --granularity MONTHLY \ --query 'CoveragesByTime[*].Total.CoverageHoursPercentage'

aws ce get-savings-plan-coverage \ --time-period Start=2026-03-01,End=2026-04-01 \ --granularity MONTHLY ```

Common Cost Spikes and Solutions

EC2 Running Instances

The most common culprit—running EC2 instances you don't need:

bash

aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running" \
  --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,LaunchTime,Tags[?Key==`Name`].Value]'

Identify idle instances:

bash

aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-12345 \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics Average \
  --query 'Datapoints[*].[Timestamp,Average]'

Instances with average CPU < 5% might be candidates for stopping:

bash

aws ec2 stop-instances --instance-ids i-idle-instance

For long-term unused instances:

bash

aws ec2 terminate-instances --instance-ids i-unused-instance

NAT Gateway Charges

NAT gateways cost ~$0.045/hour plus data processing charges:

bash

aws ec2 describe-nat-gateways \
  --query 'NatGateways[*].[NatGatewayId,State,SubnetId]'

Check NAT gateway data usage:

bash

aws cloudwatch get-metric-statistics \
  --namespace AWS/NATGateway \
  --metric-name BytesOutToDestination \
  --dimensions Name=NatGatewayId,Value=nat-12345 \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics Sum \
  --query 'Datapoints[*].[Timestamp,Sum]'

If NAT gateway has minimal usage, consider: - Deleting unused NAT gateway - Replacing with NAT instance (cheaper for low throughput) - Using VPC endpoints to avoid NAT for AWS services

Delete unused NAT gateway:

bash

aws ec2 delete-nat-gateway --nat-gateway-id nat-unused

Data Transfer Charges

Data transfer out of AWS is expensive ($0.09/GB for first 10TB):

bash

aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --filter '{"Dimensions":{"Key":"USAGE_TYPE","Values":["USW2-Egress-Internet","USW2-DataTransfer-Out-Bytes"]}}' \
  --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]'

Check CloudFront usage (often cheaper for data transfer):

```bash aws cloudfront get-distribution \ --id E1234567890ABC \ --query 'Distribution.DistributionConfig'

aws cloudwatch get-metric-statistics \ --namespace AWS/CloudFront \ --metric-name BytesDownloaded \ --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 86400 \ --statistics Sum ```

Reduce data transfer: - Use CloudFront for content delivery - Use VPC endpoints for AWS service access - Compress data before transfer - Use regional replication for S3 instead of cross-region

S3 Storage Costs

S3 storage accumulation over time:

```bash aws s3api list-buckets \ --query 'Buckets[*].[Name,CreationDate]'

aws s3api get-bucket-metrics-configuration \ --bucket my-bucket \ --id EntireBucket ```

Get bucket size:

bash

aws cloudwatch get-metric-statistics \
  --namespace AWS/S3 \
  --metric-name BucketSizeBytes \
  --dimensions Name=BucketName,Value=my-bucket Name=StorageType,Value=StandardStorage \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics Average

Check for large objects:

bash

aws s3 ls s3://my-bucket --recursive --summarize | sort -k3 -n -r | head -20

Reduce S3 costs: - Move old objects to S3 Glacier - Delete old versions if versioning enabled - Use lifecycle policies:

bash

aws s3api put-bucket-lifecycle-configuration \
  --bucket my-bucket \
  --lifecycle-configuration file://lifecycle.json

Where lifecycle.json:

json

{
  "Rules": [
    {
      "ID": "MoveToGlacier",
      "Status": "Enabled",
      "Filter": {"Prefix": "logs/"},
      "Transitions": [
        {"Days": 30, "StorageClass": "STANDARD_IA"},
        {"Days": 90, "StorageClass": "GLACIER"}
      ]
    },
    {
      "ID": "DeleteOldVersions",
      "Status": "Enabled",
      "Filter": {},
      "NoncurrentVersionExpiration": {"NoncurrentDays": 30}
    }
  ]
}

RDS Unused Instances

RDS instances running when not needed:

bash

aws rds describe-db-instances \
  --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceClass,StorageType,AllocatedStorage,DBInstanceStatus]'

Check for idle databases:

bash

aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name DatabaseConnections \
  --dimensions Name=DBInstanceIdentifier,Value=my-db \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics Sum

If consistently zero connections:

bash

aws rds stop-db-instance --db-instance-identifier my-db

For temporary databases:

bash

aws rds delete-db-instance \
  --db-instance-identifier temp-db \
  --skip-final-snapshot

Lambda Over-Provisioned Memory

Lambda with high memory allocation being called frequently:

bash

aws lambda list-functions \
  --query 'Functions[*].[FunctionName,MemorySize,Timeout]'

Check Lambda invocations:

bash

aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Invocations \
  --dimensions Name=FunctionName,Value=my-function \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics Sum

Check memory utilization:

bash

aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name MemoryUtilization \
  --dimensions Name=FunctionName,Value=my-function \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Maximum

If max memory < 50%, reduce allocation:

bash

aws lambda update-function-configuration \
  --function-name my-function \
  --memory-size 256

EBS Unattached Volumes

EBS volumes not attached to instances:

bash

aws ec2 describe-volumes \
  --filters "Name=status,Values=available" \
  --query 'Volumes[*].[VolumeId,Size,VolumeType,State,Attachments]'

Delete unattached volumes:

bash

aws ec2 delete-volume --volume-id vol-unused

Or snapshot before deletion:

bash

aws ec2 create-snapshot --volume-id vol-unused --description "Backup before deletion"
aws ec2 delete-volume --volume-id vol-unused

Snapshots Accumulation

EBS snapshots accumulating over time:

bash

aws ec2 describe-snapshots \
  --owner-ids self \
  --query 'Snapshots[*].[SnapshotId,VolumeSize,StartTime,Description]' \
  --output table | sort -k3 -n

Delete old snapshots:

bash

aws ec2 delete-snapshot --snapshot-id snap-old

Automate cleanup with lifecycle policy:

bash

aws dlm create-lifecycle-policy \
  --execution-role-arn arn:aws:iam::123456789012:role/dlm-role \
  --description "Delete old snapshots" \
  --state ENABLED \
  --policy-details file://dlm-policy.json

Elastic IP Addresses

Unattached Elastic IPs cost $0.005/hour:

bash

aws ec2 describe-addresses \
  --query 'Addresses[*].[PublicIp,AllocationId,AssociationId,InstanceId]'

Addresses without AssociationId are unattached. Release them:

bash

aws ec2 release-address --allocation-id eipalloc-unused

CloudWatch Logs Accumulation

Log groups accumulating data:

bash

aws logs describe-log-groups \
  --query 'logGroups[*].[logGroupName,storedBytes,retentionInDays]'

Set retention:

bash

aws logs put-retention-policy \
  --log-group-name /aws/lambda/my-function \
  --retention-in-days 30

Delete unused log groups:

bash

aws logs delete-log-group --log-group-name /unused/logs

Untagged Resources

Resources without tags can't be attributed to projects:

bash

aws resourcegroupstaggingapi get-resources \
  --tag-filters Key=Project,Values=[] \
  --query 'ResourceTagMappingList[*].[ResourceARN,Tags]'

Tag resources for better tracking:

bash

aws ec2 create-tags \
  --resources i-12345 \
  --tags Key=Project,Value=my-project Key=Environment,Value=production Key=Owner,Value=my-team

Enable cost allocation tags:

bash

aws ce update-cost-allocation-tags-status \
  --cost-allocation-tags-status TagKey=Project,Status=Active

Verification Steps

After cleanup, verify cost reduction:

bash

aws ce get-cost-and-usage \
  --time-period Start=2026-03-25,End=2026-04-01 \
  --granularity DAILY \
  --metrics BlendedCost \
  --query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]'

Set up cost anomaly detection:

```bash aws ce create-anomaly-monitor \ --monitor-name "CostAnomaly" \ --monitor-type DIMENSIONAL \ --monitor-definition '{"Dimension":"SERVICE","MatchOptions":["EQUALS"],"Values":["Amazon EC2","Amazon S3","Amazon RDS"]}'

aws ce create-anomaly-subscription \ --subscription-name "CostAlert" \ --threshold 100 \ --frequency DAILY \ --monitor-arn arn:aws:ce::123456789012:monitor/CostAnomaly ```

Set up billing alerts:

bash

aws cloudwatch put-metric-alarm \
  --alarm-name monthly-billing-threshold \
  --alarm-description "Monthly AWS bill exceeds threshold" \
  --namespace AWS/Billing \
  --metric-name EstimatedCharges \
  --dimensions Name=Currency,Value=USD \
  --statistic Maximum \
  --period 86400 \
  --threshold 500 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:billing-alerts

Create comprehensive cost audit script:

```bash #!/bin/bash

echo "AWS Cost Investigation Report" echo "=============================="

echo "" echo "1. Monthly Cost by Service:" aws ce get-cost-and-usage \ --time-period Start=$(date -u -d '30 days ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \ --granularity MONTHLY \ --metrics BlendedCost \ --group-by Type=DIMENSION,Key=SERVICE \ --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]' \ --output table

echo "" echo "2. Running EC2 Instances:" aws ec2 describe-instances \ --filters "Name=instance-state-name,Values=running" \ --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,LaunchTime]' \ --output table

echo "" echo "3. NAT Gateways:" aws ec2 describe-nat-gateways \ --query 'NatGateways[*].[NatGatewayId,State]' \ --output table

echo "" echo "4. Unattached EBS Volumes:" aws ec2 describe-volumes \ --filters "Name=status,Values=available" \ --query 'Volumes[*].[VolumeId,Size,VolumeType]' \ --output table

echo "" echo "5. RDS Instances:" aws rds describe-db-instances \ --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceClass,DBInstanceStatus]' \ --output table

echo "" echo "6. Elastic IPs (Unattached):" aws ec2 describe-addresses \ --query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]' \ --output table

echo "" echo "7. S3 Bucket Sizes:" for bucket in $(aws s3api list-buckets --query 'Buckets[*].Name' --output text); do size=$(aws cloudwatch get-metric-statistics \ --namespace AWS/S3 \ --metric-name BucketSizeBytes \ --dimensions Name=BucketName,Value=$bucket Name=StorageType,Value=StandardStorage \ --start-time $(date -u -d '2 days ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 86400 \ --statistics Average \ --query 'Datapoints[0].Average' \ --output text 2>/dev/null || echo "0") echo "$bucket: $size bytes" done

echo "" echo "8. Top 5 Cost Spikes (Last 7 Days):" aws ce get-cost-and-usage \ --time-period Start=$(date -u -d '7 days ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \ --granularity DAILY \ --metrics BlendedCost \ --query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]' ```

Fix AWS Unexpected High Costs Investigation

Diagnosis Commands

Common Cost Spikes and Solutions

EC2 Running Instances

NAT Gateway Charges

Data Transfer Charges

S3 Storage Costs

RDS Unused Instances

Lambda Over-Provisioned Memory

EBS Unattached Volumes

Snapshots Accumulation

Elastic IP Addresses

CloudWatch Logs Accumulation

Untagged Resources

Verification Steps

Share this guide

More AWS Troubleshooting Guides

AWS DynamoDB Contributor Insights Not Showing

AWS DynamoDB DAX Cache Miss

AWS DynamoDB Global Table Replication Lag

AWS Step Functions Workflow Stuck Waiting

AWS Step Functions Execution Throttled

AWS EventBridge Pipe Source Error