Introduction
CloudWatch alarms only evaluate the datapoints they receive. If a metric stops publishing entirely, the alarm often moves to INSUFFICIENT_DATA instead of ALARM, which means the exact outage you care about can produce no page at all. The right fix depends on whether missing data should be treated as a failure or ignored as normal silence.
Symptoms
- The alarm stays in
INSUFFICIENT_DATAduring an outage - No SNS or paging notification is sent even though the application is unhealthy
- Metric graphs show gaps rather than threshold breaches
- The alarm behaves differently for sparse custom metrics than for built-in AWS metrics
Common Causes
treatMissingDatais left at a non-breaching behavior for a heartbeat-style metric- The application or agent stopped publishing the metric entirely
- Alarm period and evaluation windows do not match the metric publish frequency
- The wrong namespace, dimension set, or statistic was selected for the alarm
Step-by-Step Fix
- 1.Inspect the alarm configuration and current state reason
- 2.Start with the alarm definition so you can see whether missing data is currently treated as breaching, ignored, or missing.
aws cloudwatch describe-alarms \
--alarm-names api-heartbeat-missing \
--query "MetricAlarms[0].[StateValue,TreatMissingData,Period,EvaluationPeriods,DatapointsToAlarm]"- 1.Confirm how often the metric is actually being published
- 2.A custom metric sent every five minutes will look missing if the alarm expects one-minute datapoints.
aws cloudwatch get-metric-statistics \
--namespace MyService \
--metric-name Heartbeat \
--dimensions Name=Environment,Value=prod \
--start-time 2026-04-10T05:00:00Z \
--end-time 2026-04-10T06:00:00Z \
--period 60 \
--statistics Sum- 1.**Set
treatMissingDatadeliberately for the alarm type** - 2.For heartbeat, health, or liveness metrics, missing data should usually be treated as breaching.
aws cloudwatch put-metric-alarm \
--alarm-name api-heartbeat-missing \
--treat-missing-data breaching- 1.Tune the period and evaluation windows to the publishing interval
- 2.Use a period and evaluation count that fit the real metric cadence, otherwise normal sparse metrics will flap between
OKandINSUFFICIENT_DATA.
Prevention
- Treat missing data as breaching for heartbeat-style metrics
- Keep metric publish frequency and alarm period explicitly aligned
- Monitor both the business metric and the telemetry path that publishes it
- Document which alarms should ignore silence and which should page on silence