Introduction
An observability migration can publish a new alert webhook while Prometheus Alertmanager still sends notifications to the old one. Alerting looks healthy, but incidents route to the retired endpoint, one receiver path uses the new webhook while another still follows the previous route tree, or failures begin only after the old integration is disabled because receivers, routing rules, and HA node config often drift separately.
Treat this as an alert-routing problem instead of a generic Prometheus outage. Start by checking which receiver and webhook URL an affected alert actually uses, because migrations often validate the new integration from a test payload while live alerts continue following older route and receiver definitions.
Symptoms
- Prometheus Alertmanager still sends alerts to the old webhook after migration
- Alerts fire, but notifications arrive only at the retired integration
- One alert route or receiver uses the new webhook while another still uses the previous one
- Delivery failures begin only after the old webhook, token, or DNS record is removed
- The new integration is healthy, but production alerts never use it consistently
- The issue started after moving Alertmanager, webhook handlers, or incident-management integrations
Common Causes
- The receiver definition still points to the old webhook URL
- Route matching sends some alerts to a legacy receiver tree
- Secret injection or mounted config still restores the previous endpoint or token
- One Alertmanager HA node still runs older configuration than the others
- GitOps, config reload, or container rollout did not actually apply the intended receiver update
- Validation confirmed the new webhook worked manually but did not verify which receiver live alerts actually matched
Step-by-Step Fix
- Capture one affected alert and record the route labels, matched receiver, webhook URL, and Alertmanager node handling it, because the live routing path determines where incident traffic really lands.
- Compare that active alert path with the intended post-migration design, because one stale receiver or route matcher can keep ongoing incidents tied to the retired integration.
- Review receiver definitions, route trees, secret sources, HA node config, and deployment manifests for references to the old webhook, because Alertmanager delivery depends on both routing logic and the config each node actually loaded.
- Check each route branch, inhibition path, and HA node separately if behavior differs, because migrations often update one receiver path while another still follows the previous endpoint.
- Update the authoritative receiver configuration and ensure every active Alertmanager node loads the intended route tree, because creating the new webhook alone does not retarget existing notifications.
- Trigger a controlled alert and confirm the intended receiver and webhook process it, because a firing rule does not prove the right integration handled delivery.
- Verify the old webhook no longer receives alert traffic from any active Alertmanager path, because split notification flows can remain hidden while both integrations stay reachable.
- Review authentication headers, TLS trust, and route matching order if delivery still fails, because the destination can be correct while webhook trust or routing precedence still blocks the new path.
- Document which team owns Alertmanager routing, secret delivery, and migration validation so future notification cutovers verify the actual runtime receiver before retiring the previous webhook.