Introduction

CloudWatch alarms only evaluate the datapoints they receive. If a metric stops publishing entirely, the alarm often moves to INSUFFICIENT_DATA instead of ALARM, which means the exact outage you care about can produce no page at all. The right fix depends on whether missing data should be treated as a failure or ignored as normal silence.

Symptoms

  • The alarm stays in INSUFFICIENT_DATA during an outage
  • No SNS or paging notification is sent even though the application is unhealthy
  • Metric graphs show gaps rather than threshold breaches
  • The alarm behaves differently for sparse custom metrics than for built-in AWS metrics

Common Causes

  • treatMissingData is left at a non-breaching behavior for a heartbeat-style metric
  • The application or agent stopped publishing the metric entirely
  • Alarm period and evaluation windows do not match the metric publish frequency
  • The wrong namespace, dimension set, or statistic was selected for the alarm

Step-by-Step Fix

  1. 1.Inspect the alarm configuration and current state reason
  2. 2.Start with the alarm definition so you can see whether missing data is currently treated as breaching, ignored, or missing.
bash
aws cloudwatch describe-alarms \
  --alarm-names api-heartbeat-missing \
  --query "MetricAlarms[0].[StateValue,TreatMissingData,Period,EvaluationPeriods,DatapointsToAlarm]"
  1. 1.Confirm how often the metric is actually being published
  2. 2.A custom metric sent every five minutes will look missing if the alarm expects one-minute datapoints.
bash
aws cloudwatch get-metric-statistics \
  --namespace MyService \
  --metric-name Heartbeat \
  --dimensions Name=Environment,Value=prod \
  --start-time 2026-04-10T05:00:00Z \
  --end-time 2026-04-10T06:00:00Z \
  --period 60 \
  --statistics Sum
  1. 1.**Set treatMissingData deliberately for the alarm type**
  2. 2.For heartbeat, health, or liveness metrics, missing data should usually be treated as breaching.
bash
aws cloudwatch put-metric-alarm \
  --alarm-name api-heartbeat-missing \
  --treat-missing-data breaching
  1. 1.Tune the period and evaluation windows to the publishing interval
  2. 2.Use a period and evaluation count that fit the real metric cadence, otherwise normal sparse metrics will flap between OK and INSUFFICIENT_DATA.

Prevention

  • Treat missing data as breaching for heartbeat-style metrics
  • Keep metric publish frequency and alarm period explicitly aligned
  • Monitor both the business metric and the telemetry path that publishes it
  • Document which alarms should ignore silence and which should page on silence