Introduction
Grafana sends alert notifications to Slack via incoming webhooks. During an alert storm -- when many alerts fire simultaneously -- Grafana can exceed Slack's rate limit of 1 message per second per webhook URL. Rate-limited notifications are dropped, meaning critical alerts never reach the team, exactly when they are most needed.
Symptoms
- Slack channel stops receiving Grafana alert notifications during incident
- Grafana logs show
Failed to send webhook to Slack: 429 Too Many Requests alerting_notification_sent_totalmetric shows fewer sent than fired alerts- Slack returns
rate_limitedin the webhook response body - Alert notifications resume after the storm subsides but the gap期间的 alerts are lost
Common Causes
- Alert storm from a widespread infrastructure issue firing hundreds of alerts simultaneously
- Slack webhook rate limit of 1 message per second per channel exceeded
- No notification deduplication or grouping configured in Grafana alert rules
- Multiple alert rules targeting the same Slack channel without coordination
- Grafana alert evaluation interval too short, generating notifications faster than Slack can process
Step-by-Step Fix
- 1.Check Grafana alert notification logs for rate limiting: Confirm the failure cause.
- 2.```bash
- 3.journalctl -u grafana-server | grep -i "slack|429|rate.limit" | tail -20
- 4.
` - 5.Configure alert notification grouping: Group related alerts into a single notification.
- 6.```yaml
- 7.# Alertmanager routing config (if using external Alertmanager)
- 8.route:
- 9.receiver: slack
- 10.group_by: ['alertname', 'severity', 'namespace']
- 11.group_wait: 30s
- 12.group_interval: 5m
- 13.repeat_interval: 4h
- 14.
` - 15.Use Slack App rate-limit handling with retry: Configure Grafana to retry rate-limited notifications.
- 16.```ini
- 17.# grafana.ini
- 18.[unified_alerting]
- 19.evaluation_timeout = 30s
- 20.# Increase notification timeout for retry
- 21.[smtp]
- 22.timeout = 30
- 23.
` - 24.Implement alert deduplication at the rule level: Reduce notification volume.
- 25.
` - 26.# Use Grafana's built-in deduplication:
- 27.# - Set alert rule evaluation interval to 1m minimum
- 28.# - Use "Pending for" period to filter transient alerts
- 29.# - Consolidate multiple similar alert rules into one
- 30.
` - 31.Add a secondary notification channel for critical alerts: Ensure alerts are not lost.
- 32.
` - 33.# Configure PagerDuty or email as a backup channel
- 34.# Route critical alerts to both Slack and PagerDuty
- 35.
`
Prevention
- Configure alert grouping with appropriate
group_waitandgroup_intervalto batch notifications - Set Slack webhook rate limit awareness in alert routing design -- max 60 notifications per minute per channel
- Use separate Slack channels for different alert severity levels to distribute load
- Implement alert storm detection that temporarily aggregates alerts during high-volume periods
- Monitor notification delivery success rate and alert when the failure rate exceeds 5%
- Use Grafana's notification policies to route alerts through multiple channels for critical severities