Introduction CloudWatch alarms that do not trigger despite threshold breaches create dangerous monitoring gaps. This can be caused by missing metrics, incorrect dimensions, or alarm configuration errors.

Symptoms - Alarm state shows "INSUFFICIENT_DATA" persistently - Metric value exceeds threshold but alarm stays "OK" - Alarm never transitions to "ALARM" state - SNS notification not sent for known issues - Custom metrics not appearing in CloudWatch

Common Causes - Metric not being published to CloudWatch - Wrong namespace or dimensions in alarm configuration - Evaluation period too long for transient issues - Metric resolution (1-min vs 5-min) mismatch - Cross-region alarm referencing wrong region metrics

Step-by-Step Fix 1. **Check if metric exists': ```bash aws cloudwatch list-metrics --namespace AWS/EC2 --metric-name CPUUtilization \ --dimensions Name=InstanceId,Value=i-1234567890 ```

  1. 1.**Check alarm configuration':
  2. 2.```bash
  3. 3.aws cloudwatch describe-alarms --alarm-names MyAlarm
  4. 4.# Check: Namespace, Dimensions, Threshold, Period, EvaluationPeriods
  5. 5.`
  6. 6.**Verify metric data points exist':
  7. 7.```bash
  8. 8.aws cloudwatch get-metric-statistics \
  9. 9.--namespace AWS/EC2 --metric-name CPUUtilization \
  10. 10.--dimensions Name=InstanceId,Value=i-1234567890 \
  11. 11.--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  12. 12.--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  13. 13.--period 300 --statistics Average
  14. 14.`

Prevention - Monitor alarm state changes with EventBridge - Test alarms with put-metric-data before relying on them - Use alarm composite models for complex conditions - Set up CloudWatch anomaly detection for dynamic thresholds - Document all alarm configurations and their purpose