# Fix AWS CloudWatch Alarm Not Triggering
You set up a CloudWatch alarm expecting it to notify you when something goes wrong, but it never triggers—even when the metric clearly exceeds the threshold. Or maybe it triggers sometimes but not consistently. Understanding why alarms fail to trigger requires looking at the alarm configuration, the metric data, and the evaluation process.
Diagnosis Commands
First, get the alarm configuration:
aws cloudwatch describe-alarms \
--alarm-names my-alarm \
--query 'MetricAlarms[*].[AlarmName,AlarmDescription,StateValue,MetricName,Namespace]'Get full alarm details:
aws cloudwatch describe-alarms \
--alarm-names my-alarm \
--query 'MetricAlarms[0]'Check the alarm's current state:
aws cloudwatch describe-alarms \
--alarm-names my-alarm \
--query 'MetricAlarms[*].[AlarmName,StateValue,StateReason,StateReasonData]'Get the alarm history to see past transitions:
aws cloudwatch describe-alarm-history \
--alarm-name my-alarm \
--history-types StateTransition \
--start-date $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
--max-items 20 \
--query 'AlarmHistoryItems[*].[Timestamp,HistorySummary,HistoryData]'Check the metric data directly:
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics Average,Maximum \
--output tableVerify the metric exists:
aws cloudwatch list-metrics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--query 'Metrics[*].Dimensions'Check if SNS notifications are configured:
aws cloudwatch describe-alarms \
--alarm-names my-alarm \
--query 'MetricAlarms[*].AlarmActions'Verify SNS topic exists and has subscriptions:
```bash aws sns list-topics \ --query 'Topics[*].[TopicArn]'
aws sns list-subscriptions-by-topic \ --topic-arn arn:aws:sns:us-east-1:123456789012:my-alerts \ --query 'Subscriptions[*].[Endpoint,Protocol,SubscriptionArn]' ```
Common Causes and Solutions
Metric Not Matching
The alarm's dimensions don't match the actual metric dimensions:
```bash # Check alarm dimensions aws cloudwatch describe-alarms \ --alarm-names my-alarm \ --query 'MetricAlarms[0].Dimensions'
# Check available metrics with their dimensions aws cloudwatch list-metrics \ --namespace AWS/EC2 \ --metric-name CPUUtilization \ --query 'Metrics[*].Dimensions' ```
Fix by recreating the alarm with correct dimensions:
aws cloudwatch put-metric-alarm \
--alarm-name my-alarm-fixed \
--alarm-description "CPU utilization over 80%" \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--statistic Average \
--period 60 \
--threshold 80 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:my-alertsInsufficient Evaluation Periods
The alarm requires multiple breaches before triggering:
aws cloudwatch describe-alarms \
--alarm-names my-alarm \
--query 'MetricAlarms[0].[EvaluationPeriods,Threshold,ComparisonOperator]'If EvaluationPeriods is high, the metric must exceed threshold for that many consecutive periods.
Reduce evaluation periods for faster response:
```bash aws cloudwatch set-alarm-state \ --alarm-name my-alarm \ --state-value INSUFFICIENT_DATA \ --state-reason "Resetting alarm for update"
aws cloudwatch put-metric-alarm \ --alarm-name my-alarm \ --alarm-description "CPU utilization over 80%" \ --namespace AWS/EC2 \ --metric-name CPUUtilization \ --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \ --statistic Average \ --period 60 \ --threshold 80 \ --comparison-operator GreaterThanOrEqualToThreshold \ --evaluation-periods 1 \ --alarm-actions arn:aws:sns:us-east-1:123456789012:my-alerts ```
Missing Metric Data (INSUFFICIENT_DATA)
If the metric isn't being published, the alarm stays in INSUFFICIENT_DATA state:
aws cloudwatch describe-alarms \
--alarm-names my-alarm \
--query 'MetricAlarms[0].StateValue'Check if metric data exists:
aws cloudwatch get-metric-statistics \
--namespace MyCustomNamespace \
--metric-name MyMetric \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics AverageIf using custom metrics, ensure your application is publishing them:
# Publish a test metric
aws cloudwatch put-metric-data \
--namespace MyCustomNamespace \
--metric-name MyMetric \
--value 50 \
--unit CountFor Lambda custom metrics:
```javascript // Lambda code to publish metrics const AWS = require('aws-sdk'); const cloudwatch = new AWS.CloudWatch();
await cloudwatch.putMetricData({ Namespace: 'MyApplication', MetricData: [{ MetricName: 'ProcessingTime', Value: processingTime, Unit: 'Milliseconds', Dimensions: [{ Name: 'FunctionName', Value: 'MyFunction' }] }] }).promise(); ```
Period vs Threshold Mismatch
The period determines how metric data is aggregated. A longer period smooths out spikes:
aws cloudwatch describe-alarms \
--alarm-names my-alarm \
--query 'MetricAlarms[0].[Period,Threshold,EvaluationPeriods]'If period is 5 minutes and threshold is 80%, a brief spike to 95% for 1 minute might not trigger.
Reduce period for spike detection:
aws cloudwatch put-metric-alarm \
--alarm-name my-alarm \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--statistic Average \
--period 60 \
--threshold 80 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:my-alertsOr use Maximum statistic instead of Average:
aws cloudwatch put-metric-alarm \
--alarm-name my-alarm-max \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--statistic Maximum \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1Wrong Statistic
Different statistics behave differently:
aws cloudwatch describe-alarms \
--alarm-names my-alarm \
--query 'MetricAlarms[0].Statistic'Common statistics:
- Average: Mean value over the period
- Maximum: Highest value in the period
- Minimum: Lowest value in the period
- Sum: Total over the period
- SampleCount: Number of data points
For CPU utilization, Average might miss spikes. Use Maximum for spike detection:
aws cloudwatch put-metric-alarm \
--alarm-name cpu-spike-alarm \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--statistic Maximum \
--period 60 \
--threshold 90 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1SNS Notification Failures
The alarm triggers but you don't get notified:
```bash # Check if SNS topic exists aws sns get-topic-attributes \ --topic-arn arn:aws:sns:us-east-1:123456789012:my-alerts
# Check subscriptions aws sns list-subscriptions-by-topic \ --topic-arn arn:aws:sns:us-east-1:123456789012:my-alerts ```
Test SNS delivery:
aws sns publish \
--topic-arn arn:aws:sns:us-east-1:123456789012:my-alerts \
--message "Test notification"If subscriptions are pending confirmation:
aws sns confirm-subscription \
--topic-arn arn:aws:sns:us-east-1:123456789012:my-alerts \
--token "confirmation-token-from-email"Alarm Missing Actions
Alarm triggers but has no actions configured:
aws cloudwatch describe-alarms \
--alarm-names my-alarm \
--query 'MetricAlarms[0].[AlarmActions,OKActions,InsufficientDataActions]'Add alarm actions:
aws cloudwatch put-metric-alarm \
--alarm-name my-alarm \
--alarm-actions arn:aws:sns:us-east-1:123456789012:my-alertsAdd OK actions to get notified when alarm returns to normal:
aws cloudwatch put-metric-alarm \
--alarm-name my-alarm \
--ok-actions arn:aws:sns:us-east-1:123456789012:my-alertsComposite Alarm Issues
Composite alarms depend on other alarms:
aws cloudwatch describe-alarms \
--alarm-names my-composite-alarm \
--query 'CompositeAlarms[0].AlarmRule'If the underlying alarms aren't in the right state, the composite won't trigger.
Check all underlying alarms:
aws cloudwatch describe-alarms \
--alarm-names underlying-alarm-1 underlying-alarm-2 \
--query 'MetricAlarms[*].[AlarmName,StateValue]'TreatMissingData Setting
How the alarm handles missing data affects behavior:
aws cloudwatch describe-alarms \
--alarm-names my-alarm \
--query 'MetricAlarms[0].TreatMissingData'Options:
- breaching: Treat as breaching threshold (triggers alarm)
- notBreaching: Treat as not breaching (returns to OK)
- ignore: Ignore missing data points (maintains current state)
- missing: Maintain INSUFFICIENT_DATA state
Set appropriate behavior:
aws cloudwatch put-metric-alarm \
--alarm-name my-alarm \
--treat-missing-data breachingUse breaching for critical metrics where missing data indicates problems.
Verification Steps
Test alarm behavior by manually setting state:
aws cloudwatch set-alarm-state \
--alarm-name my-alarm \
--state-value ALARM \
--state-reason "Testing alarm notification"Check if notification was received. Then reset:
aws cloudwatch set-alarm-state \
--alarm-name my-alarm \
--state-value OK \
--state-reason "Reset after test"Verify alarm triggers on actual threshold breach:
```bash # If using custom metrics, publish high value aws cloudwatch put-metric-data \ --namespace MyCustomNamespace \ --metric-name MyMetric \ --value 95 \ --unit Percent
# Wait for evaluation period + 1 minute, then check state sleep 120 aws cloudwatch describe-alarms \ --alarm-names my-alarm \ --query 'MetricAlarms[0].[StateValue,StateReason]' ```
Create a comprehensive alarm testing script:
```bash #!/bin/bash ALARM_NAME="my-alarm" SNS_TOPIC="arn:aws:sns:us-east-1:123456789012:my-alerts"
echo "CloudWatch Alarm Diagnostics" echo "============================"
echo "1. Current Alarm State:" aws cloudwatch describe-alarms \ --alarm-names $ALARM_NAME \ --query 'MetricAlarms[0].[StateValue,StateReason,StateReasonData]'
echo "" echo "2. Alarm Configuration:" aws cloudwatch describe-alarms \ --alarm-names $ALARM_NAME \ --query 'MetricAlarms[0].[MetricName,Namespace,Statistic,Period,Threshold,EvaluationPeriods,ComparisonOperator,Dimensions]'
echo "" echo "3. Recent Metric Data (last hour):" aws cloudwatch get-metric-statistics \ --namespace $(aws cloudwatch describe-alarms --alarm-names $ALARM_NAME --query 'MetricAlarms[0].Namespace' --output text) \ --metric-name $(aws cloudwatch describe-alarms --alarm-names $ALARM_NAME --query 'MetricAlarms[0].MetricName' --output text) \ --dimensions $(aws cloudwatch describe-alarms --alarm-names $ALARM_NAME --query 'MetricAlarms[0].Dimensions' --output json | jq -r 'map("Name=\(.Name),Value=\(.Value)") | join(",")' | sed 's/[//g' | sed 's/]//g' | sed 's/"//g') \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Average,Maximum
echo "" echo "4. SNS Topic Status:" aws sns list-subscriptions-by-topic \ --topic-arn $SNS_TOPIC \ --query 'Subscriptions[*].[Endpoint,Protocol,SubscriptionArn]'
echo "" echo "5. Testing SNS delivery..." aws sns publish \ --topic-arn $SNS_TOPIC \ --message "Alarm diagnostic test - please confirm receipt" echo "Check your notification endpoint to confirm delivery." ```
Set up alarm monitoring:
# Create alarm for alarm failures (meta-monitoring)
aws cloudwatch put-metric-alarm \
--alarm-name alarm-state-monitor \
--alarm-description "Monitor for alarms stuck in INSUFFICIENT_DATA" \
--namespace AWS/CloudWatch \
--metric-name AlarmStateINSUFFICIENT_DATA \
--statistic Average \
--period 300 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 6 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:ops-alerts