Introduction
Alertmanager routes alerts to notification receivers including webhooks for custom integrations. When the webhook endpoint is unreachable, returns errors, or times out, Alertmanager retries with exponential backoff. If all retries are exhausted, the alert notification is lost, leaving the operations team unaware of active incidents.
Symptoms
- Alertmanager logs show
webhook notification failedorcontext deadline exceeded alertmanager_notifications_failed_totalmetric increases for the webhook receiver- Alerts fire but no notifications arrive at the downstream system
- Webhook endpoint shows no incoming requests from Alertmanager
- Error message:
error="Post "http://webhook:8080/alerts": dial tcp: connection refused"
Common Causes
- Webhook endpoint crashed or was redeployed without Alertmanager configuration update
- Network policy or firewall blocking Alertmanager from reaching the webhook URL
- Webhook returning non-2xx HTTP status codes, causing Alertmanager to mark delivery as failed
- Webhook endpoint TLS certificate expired or self-signed without CA in Alertmanager truststore
- Alertmanager retry queue full due to persistent webhook failures, dropping new notifications
Step-by-Step Fix
- 1.Check Alertmanager notification failure metrics: Identify the failing webhook.
- 2.```bash
- 3.curl -s http://alertmanager:9093/metrics | grep alertmanager_notifications_failed_total
- 4.
` - 5.Test webhook endpoint connectivity from Alertmanager: Verify the endpoint is reachable.
- 6.```bash
- 7.# From Alertmanager pod or server
- 8.curl -v -X POST http://webhook:8080/alerts \
- 9.-H "Content-Type: application/json" \
- 10.-d '{"version":"4","status":"firing","alerts":[]}'
- 11.
` - 12.Check webhook endpoint logs for errors: Identify if requests are arriving and failing.
- 13.```bash
- 14.kubectl logs -l app=webhook-receiver --tail=50
- 15.
` - 16.Update Alertmanager webhook configuration: Fix the webhook URL or add TLS configuration.
- 17.```yaml
- 18.# alertmanager.yml
- 19.receivers:
- 20.- name: 'custom-webhook'
- 21.webhook_configs:
- 22.- url: 'http://webhook-receiver:8080/alerts'
- 23.send_resolved: true
- 24.http_config:
- 25.tls_config:
- 26.ca_file: /etc/alertmanager/ca.pem
- 27.
` - 28.Reload Alertmanager configuration: Apply the updated configuration.
- 29.```bash
- 30.curl -X POST http://alertmanager:9093/-/reload
- 31.# Or send SIGHUP
- 32.kill -HUP $(pidof alertmanager)
- 33.
`
Prevention
- Implement webhook endpoint health checks that Alertmanager can query before sending notifications
- Configure multiple notification receivers for critical alerts (webhook + email + PagerDuty)
- Monitor
alertmanager_notifications_failed_totaland alert on sustained failure rates - Use Alertmanager's
retry_maxandretry_initial_backoffsettings tuned for your webhook reliability - Deploy webhook receivers with high availability and auto-scaling to handle alert storms
- Test end-to-end alert delivery regularly using synthetic alerts in staging environments