# Fix AWS Unexpected High Costs Investigation
You log into AWS and see a billing alert—your monthly costs have spiked unexpectedly. Maybe it's 2x, 5x, or even 10x your normal bill. Before you panic, you need to systematically investigate where the money is going. AWS provides multiple tools for cost analysis, but navigating them requires knowing where to look.
This guide covers investigating cost spikes, identifying culprits, and implementing fixes.
Diagnosis Commands
First, get an overview of your costs:
aws ce get-cost-and-usage \
--time-period Start=2026-03-01,End=2026-04-01 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE \
--query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]'Get daily breakdown to find when spike started:
aws ce get-cost-and-usage \
--time-period Start=2026-03-01,End=2026-04-01 \
--granularity DAILY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE \
--query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]'Compare with previous month:
aws ce get-cost-and-usage \
--time-period Start=2026-02-01,End=2026-03-01 \
--granularity MONTHLY \
--metrics BlendedCost \
--query 'ResultsByTime[*].Total.BlendedCost.Amount'Get cost by region:
aws ce get-cost-and-usage \
--time-period Start=2026-03-01,End=2026-04-01 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=REGION \
--query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]'Get cost by usage type (for identifying specific charges):
aws ce get-cost-and-usage \
--time-period Start=2026-03-01,End=2026-04-01 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=USAGE_TYPE \
--query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]' \
--output table | sort -k2 -n -r | head -20Get cost by resource (requires AWS Cost Explorer API):
aws ce get-cost-and-usage \
--time-period Start=2026-03-01,End=2026-04-01 \
--granularity DAILY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=RESOURCE_ID \
--query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]'Check for reserved instance savings plans:
```bash aws ce get-reservation-coverage \ --time-period Start=2026-03-01,End=2026-04-01 \ --granularity MONTHLY \ --query 'CoveragesByTime[*].Total.CoverageHoursPercentage'
aws ce get-savings-plan-coverage \ --time-period Start=2026-03-01,End=2026-04-01 \ --granularity MONTHLY ```
Common Cost Spikes and Solutions
EC2 Running Instances
The most common culprit—running EC2 instances you don't need:
aws ec2 describe-instances \
--filters "Name=instance-state-name,Values=running" \
--query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,LaunchTime,Tags[?Key==`Name`].Value]'Identify idle instances:
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-12345 \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 86400 \
--statistics Average \
--query 'Datapoints[*].[Timestamp,Average]'Instances with average CPU < 5% might be candidates for stopping:
aws ec2 stop-instances --instance-ids i-idle-instanceFor long-term unused instances:
aws ec2 terminate-instances --instance-ids i-unused-instanceNAT Gateway Charges
NAT gateways cost ~$0.045/hour plus data processing charges:
aws ec2 describe-nat-gateways \
--query 'NatGateways[*].[NatGatewayId,State,SubnetId]'Check NAT gateway data usage:
aws cloudwatch get-metric-statistics \
--namespace AWS/NATGateway \
--metric-name BytesOutToDestination \
--dimensions Name=NatGatewayId,Value=nat-12345 \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 86400 \
--statistics Sum \
--query 'Datapoints[*].[Timestamp,Sum]'If NAT gateway has minimal usage, consider: - Deleting unused NAT gateway - Replacing with NAT instance (cheaper for low throughput) - Using VPC endpoints to avoid NAT for AWS services
Delete unused NAT gateway:
aws ec2 delete-nat-gateway --nat-gateway-id nat-unusedData Transfer Charges
Data transfer out of AWS is expensive ($0.09/GB for first 10TB):
aws ce get-cost-and-usage \
--time-period Start=2026-03-01,End=2026-04-01 \
--granularity MONTHLY \
--metrics BlendedCost \
--filter '{"Dimensions":{"Key":"USAGE_TYPE","Values":["USW2-Egress-Internet","USW2-DataTransfer-Out-Bytes"]}}' \
--query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]'Check CloudFront usage (often cheaper for data transfer):
```bash aws cloudfront get-distribution \ --id E1234567890ABC \ --query 'Distribution.DistributionConfig'
aws cloudwatch get-metric-statistics \ --namespace AWS/CloudFront \ --metric-name BytesDownloaded \ --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 86400 \ --statistics Sum ```
Reduce data transfer: - Use CloudFront for content delivery - Use VPC endpoints for AWS service access - Compress data before transfer - Use regional replication for S3 instead of cross-region
S3 Storage Costs
S3 storage accumulation over time:
```bash aws s3api list-buckets \ --query 'Buckets[*].[Name,CreationDate]'
aws s3api get-bucket-metrics-configuration \ --bucket my-bucket \ --id EntireBucket ```
Get bucket size:
aws cloudwatch get-metric-statistics \
--namespace AWS/S3 \
--metric-name BucketSizeBytes \
--dimensions Name=BucketName,Value=my-bucket Name=StorageType,Value=StandardStorage \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 86400 \
--statistics AverageCheck for large objects:
aws s3 ls s3://my-bucket --recursive --summarize | sort -k3 -n -r | head -20Reduce S3 costs: - Move old objects to S3 Glacier - Delete old versions if versioning enabled - Use lifecycle policies:
aws s3api put-bucket-lifecycle-configuration \
--bucket my-bucket \
--lifecycle-configuration file://lifecycle.jsonWhere lifecycle.json:
{
"Rules": [
{
"ID": "MoveToGlacier",
"Status": "Enabled",
"Filter": {"Prefix": "logs/"},
"Transitions": [
{"Days": 30, "StorageClass": "STANDARD_IA"},
{"Days": 90, "StorageClass": "GLACIER"}
]
},
{
"ID": "DeleteOldVersions",
"Status": "Enabled",
"Filter": {},
"NoncurrentVersionExpiration": {"NoncurrentDays": 30}
}
]
}RDS Unused Instances
RDS instances running when not needed:
aws rds describe-db-instances \
--query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceClass,StorageType,AllocatedStorage,DBInstanceStatus]'Check for idle databases:
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name DatabaseConnections \
--dimensions Name=DBInstanceIdentifier,Value=my-db \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 86400 \
--statistics SumIf consistently zero connections:
aws rds stop-db-instance --db-instance-identifier my-dbFor temporary databases:
aws rds delete-db-instance \
--db-instance-identifier temp-db \
--skip-final-snapshotLambda Over-Provisioned Memory
Lambda with high memory allocation being called frequently:
aws lambda list-functions \
--query 'Functions[*].[FunctionName,MemorySize,Timeout]'Check Lambda invocations:
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name Invocations \
--dimensions Name=FunctionName,Value=my-function \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 86400 \
--statistics SumCheck memory utilization:
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name MemoryUtilization \
--dimensions Name=FunctionName,Value=my-function \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics MaximumIf max memory < 50%, reduce allocation:
aws lambda update-function-configuration \
--function-name my-function \
--memory-size 256EBS Unattached Volumes
EBS volumes not attached to instances:
aws ec2 describe-volumes \
--filters "Name=status,Values=available" \
--query 'Volumes[*].[VolumeId,Size,VolumeType,State,Attachments]'Delete unattached volumes:
aws ec2 delete-volume --volume-id vol-unusedOr snapshot before deletion:
aws ec2 create-snapshot --volume-id vol-unused --description "Backup before deletion"
aws ec2 delete-volume --volume-id vol-unusedSnapshots Accumulation
EBS snapshots accumulating over time:
aws ec2 describe-snapshots \
--owner-ids self \
--query 'Snapshots[*].[SnapshotId,VolumeSize,StartTime,Description]' \
--output table | sort -k3 -nDelete old snapshots:
aws ec2 delete-snapshot --snapshot-id snap-oldAutomate cleanup with lifecycle policy:
aws dlm create-lifecycle-policy \
--execution-role-arn arn:aws:iam::123456789012:role/dlm-role \
--description "Delete old snapshots" \
--state ENABLED \
--policy-details file://dlm-policy.jsonElastic IP Addresses
Unattached Elastic IPs cost $0.005/hour:
aws ec2 describe-addresses \
--query 'Addresses[*].[PublicIp,AllocationId,AssociationId,InstanceId]'Addresses without AssociationId are unattached. Release them:
aws ec2 release-address --allocation-id eipalloc-unusedCloudWatch Logs Accumulation
Log groups accumulating data:
aws logs describe-log-groups \
--query 'logGroups[*].[logGroupName,storedBytes,retentionInDays]'Set retention:
aws logs put-retention-policy \
--log-group-name /aws/lambda/my-function \
--retention-in-days 30Delete unused log groups:
aws logs delete-log-group --log-group-name /unused/logsUntagged Resources
Resources without tags can't be attributed to projects:
aws resourcegroupstaggingapi get-resources \
--tag-filters Key=Project,Values=[] \
--query 'ResourceTagMappingList[*].[ResourceARN,Tags]'Tag resources for better tracking:
aws ec2 create-tags \
--resources i-12345 \
--tags Key=Project,Value=my-project Key=Environment,Value=production Key=Owner,Value=my-teamEnable cost allocation tags:
aws ce update-cost-allocation-tags-status \
--cost-allocation-tags-status TagKey=Project,Status=ActiveVerification Steps
After cleanup, verify cost reduction:
aws ce get-cost-and-usage \
--time-period Start=2026-03-25,End=2026-04-01 \
--granularity DAILY \
--metrics BlendedCost \
--query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]'Set up cost anomaly detection:
```bash aws ce create-anomaly-monitor \ --monitor-name "CostAnomaly" \ --monitor-type DIMENSIONAL \ --monitor-definition '{"Dimension":"SERVICE","MatchOptions":["EQUALS"],"Values":["Amazon EC2","Amazon S3","Amazon RDS"]}'
aws ce create-anomaly-subscription \ --subscription-name "CostAlert" \ --threshold 100 \ --frequency DAILY \ --monitor-arn arn:aws:ce::123456789012:monitor/CostAnomaly ```
Set up billing alerts:
aws cloudwatch put-metric-alarm \
--alarm-name monthly-billing-threshold \
--alarm-description "Monthly AWS bill exceeds threshold" \
--namespace AWS/Billing \
--metric-name EstimatedCharges \
--dimensions Name=Currency,Value=USD \
--statistic Maximum \
--period 86400 \
--threshold 500 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:billing-alertsCreate comprehensive cost audit script:
```bash #!/bin/bash
echo "AWS Cost Investigation Report" echo "=============================="
echo "" echo "1. Monthly Cost by Service:" aws ce get-cost-and-usage \ --time-period Start=$(date -u -d '30 days ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \ --granularity MONTHLY \ --metrics BlendedCost \ --group-by Type=DIMENSION,Key=SERVICE \ --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]' \ --output table
echo "" echo "2. Running EC2 Instances:" aws ec2 describe-instances \ --filters "Name=instance-state-name,Values=running" \ --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,LaunchTime]' \ --output table
echo "" echo "3. NAT Gateways:" aws ec2 describe-nat-gateways \ --query 'NatGateways[*].[NatGatewayId,State]' \ --output table
echo "" echo "4. Unattached EBS Volumes:" aws ec2 describe-volumes \ --filters "Name=status,Values=available" \ --query 'Volumes[*].[VolumeId,Size,VolumeType]' \ --output table
echo "" echo "5. RDS Instances:" aws rds describe-db-instances \ --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceClass,DBInstanceStatus]' \ --output table
echo ""
echo "6. Elastic IPs (Unattached):"
aws ec2 describe-addresses \
--query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]' \
--output table
echo "" echo "7. S3 Bucket Sizes:" for bucket in $(aws s3api list-buckets --query 'Buckets[*].Name' --output text); do size=$(aws cloudwatch get-metric-statistics \ --namespace AWS/S3 \ --metric-name BucketSizeBytes \ --dimensions Name=BucketName,Value=$bucket Name=StorageType,Value=StandardStorage \ --start-time $(date -u -d '2 days ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 86400 \ --statistics Average \ --query 'Datapoints[0].Average' \ --output text 2>/dev/null || echo "0") echo "$bucket: $size bytes" done
echo "" echo "8. Top 5 Cost Spikes (Last 7 Days):" aws ce get-cost-and-usage \ --time-period Start=$(date -u -d '7 days ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \ --granularity DAILY \ --metrics BlendedCost \ --query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]' ```