# Fix AWS Unexpected High Costs Investigation

You log into AWS and see a billing alert—your monthly costs have spiked unexpectedly. Maybe it's 2x, 5x, or even 10x your normal bill. Before you panic, you need to systematically investigate where the money is going. AWS provides multiple tools for cost analysis, but navigating them requires knowing where to look.

This guide covers investigating cost spikes, identifying culprits, and implementing fixes.

Diagnosis Commands

First, get an overview of your costs:

bash
aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE \
  --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]'

Get daily breakdown to find when spike started:

bash
aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity DAILY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE \
  --query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]'

Compare with previous month:

bash
aws ce get-cost-and-usage \
  --time-period Start=2026-02-01,End=2026-03-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --query 'ResultsByTime[*].Total.BlendedCost.Amount'

Get cost by region:

bash
aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=REGION \
  --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]'

Get cost by usage type (for identifying specific charges):

bash
aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=USAGE_TYPE \
  --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]' \
  --output table | sort -k2 -n -r | head -20

Get cost by resource (requires AWS Cost Explorer API):

bash
aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity DAILY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=RESOURCE_ID \
  --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]'

Check for reserved instance savings plans:

```bash aws ce get-reservation-coverage \ --time-period Start=2026-03-01,End=2026-04-01 \ --granularity MONTHLY \ --query 'CoveragesByTime[*].Total.CoverageHoursPercentage'

aws ce get-savings-plan-coverage \ --time-period Start=2026-03-01,End=2026-04-01 \ --granularity MONTHLY ```

Common Cost Spikes and Solutions

EC2 Running Instances

The most common culprit—running EC2 instances you don't need:

bash
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running" \
  --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,LaunchTime,Tags[?Key==`Name`].Value]'

Identify idle instances:

bash
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-12345 \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics Average \
  --query 'Datapoints[*].[Timestamp,Average]'

Instances with average CPU < 5% might be candidates for stopping:

bash
aws ec2 stop-instances --instance-ids i-idle-instance

For long-term unused instances:

bash
aws ec2 terminate-instances --instance-ids i-unused-instance

NAT Gateway Charges

NAT gateways cost ~$0.045/hour plus data processing charges:

bash
aws ec2 describe-nat-gateways \
  --query 'NatGateways[*].[NatGatewayId,State,SubnetId]'

Check NAT gateway data usage:

bash
aws cloudwatch get-metric-statistics \
  --namespace AWS/NATGateway \
  --metric-name BytesOutToDestination \
  --dimensions Name=NatGatewayId,Value=nat-12345 \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics Sum \
  --query 'Datapoints[*].[Timestamp,Sum]'

If NAT gateway has minimal usage, consider: - Deleting unused NAT gateway - Replacing with NAT instance (cheaper for low throughput) - Using VPC endpoints to avoid NAT for AWS services

Delete unused NAT gateway:

bash
aws ec2 delete-nat-gateway --nat-gateway-id nat-unused

Data Transfer Charges

Data transfer out of AWS is expensive ($0.09/GB for first 10TB):

bash
aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --filter '{"Dimensions":{"Key":"USAGE_TYPE","Values":["USW2-Egress-Internet","USW2-DataTransfer-Out-Bytes"]}}' \
  --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]'

Check CloudFront usage (often cheaper for data transfer):

```bash aws cloudfront get-distribution \ --id E1234567890ABC \ --query 'Distribution.DistributionConfig'

aws cloudwatch get-metric-statistics \ --namespace AWS/CloudFront \ --metric-name BytesDownloaded \ --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 86400 \ --statistics Sum ```

Reduce data transfer: - Use CloudFront for content delivery - Use VPC endpoints for AWS service access - Compress data before transfer - Use regional replication for S3 instead of cross-region

S3 Storage Costs

S3 storage accumulation over time:

```bash aws s3api list-buckets \ --query 'Buckets[*].[Name,CreationDate]'

aws s3api get-bucket-metrics-configuration \ --bucket my-bucket \ --id EntireBucket ```

Get bucket size:

bash
aws cloudwatch get-metric-statistics \
  --namespace AWS/S3 \
  --metric-name BucketSizeBytes \
  --dimensions Name=BucketName,Value=my-bucket Name=StorageType,Value=StandardStorage \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics Average

Check for large objects:

bash
aws s3 ls s3://my-bucket --recursive --summarize | sort -k3 -n -r | head -20

Reduce S3 costs: - Move old objects to S3 Glacier - Delete old versions if versioning enabled - Use lifecycle policies:

bash
aws s3api put-bucket-lifecycle-configuration \
  --bucket my-bucket \
  --lifecycle-configuration file://lifecycle.json

Where lifecycle.json:

json
{
  "Rules": [
    {
      "ID": "MoveToGlacier",
      "Status": "Enabled",
      "Filter": {"Prefix": "logs/"},
      "Transitions": [
        {"Days": 30, "StorageClass": "STANDARD_IA"},
        {"Days": 90, "StorageClass": "GLACIER"}
      ]
    },
    {
      "ID": "DeleteOldVersions",
      "Status": "Enabled",
      "Filter": {},
      "NoncurrentVersionExpiration": {"NoncurrentDays": 30}
    }
  ]
}

RDS Unused Instances

RDS instances running when not needed:

bash
aws rds describe-db-instances \
  --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceClass,StorageType,AllocatedStorage,DBInstanceStatus]'

Check for idle databases:

bash
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name DatabaseConnections \
  --dimensions Name=DBInstanceIdentifier,Value=my-db \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics Sum

If consistently zero connections:

bash
aws rds stop-db-instance --db-instance-identifier my-db

For temporary databases:

bash
aws rds delete-db-instance \
  --db-instance-identifier temp-db \
  --skip-final-snapshot

Lambda Over-Provisioned Memory

Lambda with high memory allocation being called frequently:

bash
aws lambda list-functions \
  --query 'Functions[*].[FunctionName,MemorySize,Timeout]'

Check Lambda invocations:

bash
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name Invocations \
  --dimensions Name=FunctionName,Value=my-function \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics Sum

Check memory utilization:

bash
aws cloudwatch get-metric-statistics \
  --namespace AWS/Lambda \
  --metric-name MemoryUtilization \
  --dimensions Name=FunctionName,Value=my-function \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Maximum

If max memory < 50%, reduce allocation:

bash
aws lambda update-function-configuration \
  --function-name my-function \
  --memory-size 256

EBS Unattached Volumes

EBS volumes not attached to instances:

bash
aws ec2 describe-volumes \
  --filters "Name=status,Values=available" \
  --query 'Volumes[*].[VolumeId,Size,VolumeType,State,Attachments]'

Delete unattached volumes:

bash
aws ec2 delete-volume --volume-id vol-unused

Or snapshot before deletion:

bash
aws ec2 create-snapshot --volume-id vol-unused --description "Backup before deletion"
aws ec2 delete-volume --volume-id vol-unused

Snapshots Accumulation

EBS snapshots accumulating over time:

bash
aws ec2 describe-snapshots \
  --owner-ids self \
  --query 'Snapshots[*].[SnapshotId,VolumeSize,StartTime,Description]' \
  --output table | sort -k3 -n

Delete old snapshots:

bash
aws ec2 delete-snapshot --snapshot-id snap-old

Automate cleanup with lifecycle policy:

bash
aws dlm create-lifecycle-policy \
  --execution-role-arn arn:aws:iam::123456789012:role/dlm-role \
  --description "Delete old snapshots" \
  --state ENABLED \
  --policy-details file://dlm-policy.json

Elastic IP Addresses

Unattached Elastic IPs cost $0.005/hour:

bash
aws ec2 describe-addresses \
  --query 'Addresses[*].[PublicIp,AllocationId,AssociationId,InstanceId]'

Addresses without AssociationId are unattached. Release them:

bash
aws ec2 release-address --allocation-id eipalloc-unused

CloudWatch Logs Accumulation

Log groups accumulating data:

bash
aws logs describe-log-groups \
  --query 'logGroups[*].[logGroupName,storedBytes,retentionInDays]'

Set retention:

bash
aws logs put-retention-policy \
  --log-group-name /aws/lambda/my-function \
  --retention-in-days 30

Delete unused log groups:

bash
aws logs delete-log-group --log-group-name /unused/logs

Untagged Resources

Resources without tags can't be attributed to projects:

bash
aws resourcegroupstaggingapi get-resources \
  --tag-filters Key=Project,Values=[] \
  --query 'ResourceTagMappingList[*].[ResourceARN,Tags]'

Tag resources for better tracking:

bash
aws ec2 create-tags \
  --resources i-12345 \
  --tags Key=Project,Value=my-project Key=Environment,Value=production Key=Owner,Value=my-team

Enable cost allocation tags:

bash
aws ce update-cost-allocation-tags-status \
  --cost-allocation-tags-status TagKey=Project,Status=Active

Verification Steps

After cleanup, verify cost reduction:

bash
aws ce get-cost-and-usage \
  --time-period Start=2026-03-25,End=2026-04-01 \
  --granularity DAILY \
  --metrics BlendedCost \
  --query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]'

Set up cost anomaly detection:

```bash aws ce create-anomaly-monitor \ --monitor-name "CostAnomaly" \ --monitor-type DIMENSIONAL \ --monitor-definition '{"Dimension":"SERVICE","MatchOptions":["EQUALS"],"Values":["Amazon EC2","Amazon S3","Amazon RDS"]}'

aws ce create-anomaly-subscription \ --subscription-name "CostAlert" \ --threshold 100 \ --frequency DAILY \ --monitor-arn arn:aws:ce::123456789012:monitor/CostAnomaly ```

Set up billing alerts:

bash
aws cloudwatch put-metric-alarm \
  --alarm-name monthly-billing-threshold \
  --alarm-description "Monthly AWS bill exceeds threshold" \
  --namespace AWS/Billing \
  --metric-name EstimatedCharges \
  --dimensions Name=Currency,Value=USD \
  --statistic Maximum \
  --period 86400 \
  --threshold 500 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:billing-alerts

Create comprehensive cost audit script:

```bash #!/bin/bash

echo "AWS Cost Investigation Report" echo "=============================="

echo "" echo "1. Monthly Cost by Service:" aws ce get-cost-and-usage \ --time-period Start=$(date -u -d '30 days ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \ --granularity MONTHLY \ --metrics BlendedCost \ --group-by Type=DIMENSION,Key=SERVICE \ --query 'ResultsByTime[*].Groups[*].[Keys[0],Metrics.BlendedCost.Amount]' \ --output table

echo "" echo "2. Running EC2 Instances:" aws ec2 describe-instances \ --filters "Name=instance-state-name,Values=running" \ --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,LaunchTime]' \ --output table

echo "" echo "3. NAT Gateways:" aws ec2 describe-nat-gateways \ --query 'NatGateways[*].[NatGatewayId,State]' \ --output table

echo "" echo "4. Unattached EBS Volumes:" aws ec2 describe-volumes \ --filters "Name=status,Values=available" \ --query 'Volumes[*].[VolumeId,Size,VolumeType]' \ --output table

echo "" echo "5. RDS Instances:" aws rds describe-db-instances \ --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceClass,DBInstanceStatus]' \ --output table

echo "" echo "6. Elastic IPs (Unattached):" aws ec2 describe-addresses \ --query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]' \ --output table

echo "" echo "7. S3 Bucket Sizes:" for bucket in $(aws s3api list-buckets --query 'Buckets[*].Name' --output text); do size=$(aws cloudwatch get-metric-statistics \ --namespace AWS/S3 \ --metric-name BucketSizeBytes \ --dimensions Name=BucketName,Value=$bucket Name=StorageType,Value=StandardStorage \ --start-time $(date -u -d '2 days ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 86400 \ --statistics Average \ --query 'Datapoints[0].Average' \ --output text 2>/dev/null || echo "0") echo "$bucket: $size bytes" done

echo "" echo "8. Top 5 Cost Spikes (Last 7 Days):" aws ce get-cost-and-usage \ --time-period Start=$(date -u -d '7 days ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \ --granularity DAILY \ --metrics BlendedCost \ --query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]' ```