# Fix AWS CloudWatch Alarm Not Triggering

You set up a CloudWatch alarm expecting it to notify you when something goes wrong, but it never triggers—even when the metric clearly exceeds the threshold. Or maybe it triggers sometimes but not consistently. Understanding why alarms fail to trigger requires looking at the alarm configuration, the metric data, and the evaluation process.

Diagnosis Commands

First, get the alarm configuration:

bash
aws cloudwatch describe-alarms \
  --alarm-names my-alarm \
  --query 'MetricAlarms[*].[AlarmName,AlarmDescription,StateValue,MetricName,Namespace]'

Get full alarm details:

bash
aws cloudwatch describe-alarms \
  --alarm-names my-alarm \
  --query 'MetricAlarms[0]'

Check the alarm's current state:

bash
aws cloudwatch describe-alarms \
  --alarm-names my-alarm \
  --query 'MetricAlarms[*].[AlarmName,StateValue,StateReason,StateReasonData]'

Get the alarm history to see past transitions:

bash
aws cloudwatch describe-alarm-history \
  --alarm-name my-alarm \
  --history-types StateTransition \
  --start-date $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --max-items 20 \
  --query 'AlarmHistoryItems[*].[Timestamp,HistorySummary,HistoryData]'

Check the metric data directly:

bash
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Average,Maximum \
  --output table

Verify the metric exists:

bash
aws cloudwatch list-metrics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --query 'Metrics[*].Dimensions'

Check if SNS notifications are configured:

bash
aws cloudwatch describe-alarms \
  --alarm-names my-alarm \
  --query 'MetricAlarms[*].AlarmActions'

Verify SNS topic exists and has subscriptions:

```bash aws sns list-topics \ --query 'Topics[*].[TopicArn]'

aws sns list-subscriptions-by-topic \ --topic-arn arn:aws:sns:us-east-1:123456789012:my-alerts \ --query 'Subscriptions[*].[Endpoint,Protocol,SubscriptionArn]' ```

Common Causes and Solutions

Metric Not Matching

The alarm's dimensions don't match the actual metric dimensions:

```bash # Check alarm dimensions aws cloudwatch describe-alarms \ --alarm-names my-alarm \ --query 'MetricAlarms[0].Dimensions'

# Check available metrics with their dimensions aws cloudwatch list-metrics \ --namespace AWS/EC2 \ --metric-name CPUUtilization \ --query 'Metrics[*].Dimensions' ```

Fix by recreating the alarm with correct dimensions:

bash
aws cloudwatch put-metric-alarm \
  --alarm-name my-alarm-fixed \
  --alarm-description "CPU utilization over 80%" \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --statistic Average \
  --period 60 \
  --threshold 80 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:my-alerts

Insufficient Evaluation Periods

The alarm requires multiple breaches before triggering:

bash
aws cloudwatch describe-alarms \
  --alarm-names my-alarm \
  --query 'MetricAlarms[0].[EvaluationPeriods,Threshold,ComparisonOperator]'

If EvaluationPeriods is high, the metric must exceed threshold for that many consecutive periods.

Reduce evaluation periods for faster response:

```bash aws cloudwatch set-alarm-state \ --alarm-name my-alarm \ --state-value INSUFFICIENT_DATA \ --state-reason "Resetting alarm for update"

aws cloudwatch put-metric-alarm \ --alarm-name my-alarm \ --alarm-description "CPU utilization over 80%" \ --namespace AWS/EC2 \ --metric-name CPUUtilization \ --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \ --statistic Average \ --period 60 \ --threshold 80 \ --comparison-operator GreaterThanOrEqualToThreshold \ --evaluation-periods 1 \ --alarm-actions arn:aws:sns:us-east-1:123456789012:my-alerts ```

Missing Metric Data (INSUFFICIENT_DATA)

If the metric isn't being published, the alarm stays in INSUFFICIENT_DATA state:

bash
aws cloudwatch describe-alarms \
  --alarm-names my-alarm \
  --query 'MetricAlarms[0].StateValue'

Check if metric data exists:

bash
aws cloudwatch get-metric-statistics \
  --namespace MyCustomNamespace \
  --metric-name MyMetric \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Average

If using custom metrics, ensure your application is publishing them:

bash
# Publish a test metric
aws cloudwatch put-metric-data \
  --namespace MyCustomNamespace \
  --metric-name MyMetric \
  --value 50 \
  --unit Count

For Lambda custom metrics:

```javascript // Lambda code to publish metrics const AWS = require('aws-sdk'); const cloudwatch = new AWS.CloudWatch();

await cloudwatch.putMetricData({ Namespace: 'MyApplication', MetricData: [{ MetricName: 'ProcessingTime', Value: processingTime, Unit: 'Milliseconds', Dimensions: [{ Name: 'FunctionName', Value: 'MyFunction' }] }] }).promise(); ```

Period vs Threshold Mismatch

The period determines how metric data is aggregated. A longer period smooths out spikes:

bash
aws cloudwatch describe-alarms \
  --alarm-names my-alarm \
  --query 'MetricAlarms[0].[Period,Threshold,EvaluationPeriods]'

If period is 5 minutes and threshold is 80%, a brief spike to 95% for 1 minute might not trigger.

Reduce period for spike detection:

bash
aws cloudwatch put-metric-alarm \
  --alarm-name my-alarm \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --statistic Average \
  --period 60 \
  --threshold 80 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:my-alerts

Or use Maximum statistic instead of Average:

bash
aws cloudwatch put-metric-alarm \
  --alarm-name my-alarm-max \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --statistic Maximum \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1

Wrong Statistic

Different statistics behave differently:

bash
aws cloudwatch describe-alarms \
  --alarm-names my-alarm \
  --query 'MetricAlarms[0].Statistic'

Common statistics: - Average: Mean value over the period - Maximum: Highest value in the period - Minimum: Lowest value in the period - Sum: Total over the period - SampleCount: Number of data points

For CPU utilization, Average might miss spikes. Use Maximum for spike detection:

bash
aws cloudwatch put-metric-alarm \
  --alarm-name cpu-spike-alarm \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --statistic Maximum \
  --period 60 \
  --threshold 90 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1

SNS Notification Failures

The alarm triggers but you don't get notified:

```bash # Check if SNS topic exists aws sns get-topic-attributes \ --topic-arn arn:aws:sns:us-east-1:123456789012:my-alerts

# Check subscriptions aws sns list-subscriptions-by-topic \ --topic-arn arn:aws:sns:us-east-1:123456789012:my-alerts ```

Test SNS delivery:

bash
aws sns publish \
  --topic-arn arn:aws:sns:us-east-1:123456789012:my-alerts \
  --message "Test notification"

If subscriptions are pending confirmation:

bash
aws sns confirm-subscription \
  --topic-arn arn:aws:sns:us-east-1:123456789012:my-alerts \
  --token "confirmation-token-from-email"

Alarm Missing Actions

Alarm triggers but has no actions configured:

bash
aws cloudwatch describe-alarms \
  --alarm-names my-alarm \
  --query 'MetricAlarms[0].[AlarmActions,OKActions,InsufficientDataActions]'

Add alarm actions:

bash
aws cloudwatch put-metric-alarm \
  --alarm-name my-alarm \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:my-alerts

Add OK actions to get notified when alarm returns to normal:

bash
aws cloudwatch put-metric-alarm \
  --alarm-name my-alarm \
  --ok-actions arn:aws:sns:us-east-1:123456789012:my-alerts

Composite Alarm Issues

Composite alarms depend on other alarms:

bash
aws cloudwatch describe-alarms \
  --alarm-names my-composite-alarm \
  --query 'CompositeAlarms[0].AlarmRule'

If the underlying alarms aren't in the right state, the composite won't trigger.

Check all underlying alarms:

bash
aws cloudwatch describe-alarms \
  --alarm-names underlying-alarm-1 underlying-alarm-2 \
  --query 'MetricAlarms[*].[AlarmName,StateValue]'

TreatMissingData Setting

How the alarm handles missing data affects behavior:

bash
aws cloudwatch describe-alarms \
  --alarm-names my-alarm \
  --query 'MetricAlarms[0].TreatMissingData'

Options: - breaching: Treat as breaching threshold (triggers alarm) - notBreaching: Treat as not breaching (returns to OK) - ignore: Ignore missing data points (maintains current state) - missing: Maintain INSUFFICIENT_DATA state

Set appropriate behavior:

bash
aws cloudwatch put-metric-alarm \
  --alarm-name my-alarm \
  --treat-missing-data breaching

Use breaching for critical metrics where missing data indicates problems.

Verification Steps

Test alarm behavior by manually setting state:

bash
aws cloudwatch set-alarm-state \
  --alarm-name my-alarm \
  --state-value ALARM \
  --state-reason "Testing alarm notification"

Check if notification was received. Then reset:

bash
aws cloudwatch set-alarm-state \
  --alarm-name my-alarm \
  --state-value OK \
  --state-reason "Reset after test"

Verify alarm triggers on actual threshold breach:

```bash # If using custom metrics, publish high value aws cloudwatch put-metric-data \ --namespace MyCustomNamespace \ --metric-name MyMetric \ --value 95 \ --unit Percent

# Wait for evaluation period + 1 minute, then check state sleep 120 aws cloudwatch describe-alarms \ --alarm-names my-alarm \ --query 'MetricAlarms[0].[StateValue,StateReason]' ```

Create a comprehensive alarm testing script:

```bash #!/bin/bash ALARM_NAME="my-alarm" SNS_TOPIC="arn:aws:sns:us-east-1:123456789012:my-alerts"

echo "CloudWatch Alarm Diagnostics" echo "============================"

echo "1. Current Alarm State:" aws cloudwatch describe-alarms \ --alarm-names $ALARM_NAME \ --query 'MetricAlarms[0].[StateValue,StateReason,StateReasonData]'

echo "" echo "2. Alarm Configuration:" aws cloudwatch describe-alarms \ --alarm-names $ALARM_NAME \ --query 'MetricAlarms[0].[MetricName,Namespace,Statistic,Period,Threshold,EvaluationPeriods,ComparisonOperator,Dimensions]'

echo "" echo "3. Recent Metric Data (last hour):" aws cloudwatch get-metric-statistics \ --namespace $(aws cloudwatch describe-alarms --alarm-names $ALARM_NAME --query 'MetricAlarms[0].Namespace' --output text) \ --metric-name $(aws cloudwatch describe-alarms --alarm-names $ALARM_NAME --query 'MetricAlarms[0].MetricName' --output text) \ --dimensions $(aws cloudwatch describe-alarms --alarm-names $ALARM_NAME --query 'MetricAlarms[0].Dimensions' --output json | jq -r 'map("Name=\(.Name),Value=\(.Value)") | join(",")' | sed 's/[//g' | sed 's/]//g' | sed 's/"//g') \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Average,Maximum

echo "" echo "4. SNS Topic Status:" aws sns list-subscriptions-by-topic \ --topic-arn $SNS_TOPIC \ --query 'Subscriptions[*].[Endpoint,Protocol,SubscriptionArn]'

echo "" echo "5. Testing SNS delivery..." aws sns publish \ --topic-arn $SNS_TOPIC \ --message "Alarm diagnostic test - please confirm receipt" echo "Check your notification endpoint to confirm delivery." ```

Set up alarm monitoring:

bash
# Create alarm for alarm failures (meta-monitoring)
aws cloudwatch put-metric-alarm \
  --alarm-name alarm-state-monitor \
  --alarm-description "Monitor for alarms stuck in INSUFFICIENT_DATA" \
  --namespace AWS/CloudWatch \
  --metric-name AlarmStateINSUFFICIENT_DATA \
  --statistic Average \
  --period 300 \
  --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 6 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:ops-alerts