Introduction
Grafana's alerting system notifies teams when metrics cross defined thresholds. When alerting fails, you may see errors like "failed to evaluate alert rule," "notification failed," or alerts that never fire despite conditions being met. These issues can be caused by datasource problems, query errors, notification channel misconfiguration, or alert rule conflicts.
Symptoms
- Alert rules show "Error" or "Unknown" state instead of Normal/Alerting
- Error: "failed to evaluate queries" in alert rule status
- Notifications are not sent when alerts fire
- Error: "contact point failed" or "notification delivery failed"
- Alerts fire but immediately resolve, creating notification spam
- Alert history shows no evaluations despite conditions being met
Common Causes
- Datasource query returns no data or invalid format
- Alert rule query references deleted or renamed metrics
- Contact point (notification channel) has invalid configuration
- Alert rule evaluation timeout is too short for complex queries
- Datasource is unreachable during alert evaluation
- Silence or inhibition rules are blocking notifications
- Alert manager configuration is invalid
Step-by-Step Fix
Datasource Query Issues
- 1.Test the alert query directly in Explore:
- 2.```bash
- 3.# Navigate to Explore > Run the exact query from alert rule
- 4.# Check for errors or empty results
- 5.
` - 6.Verify the datasource is healthy:
- 7.```bash
- 8.curl -s http://localhost:3000/api/datasources/proxy/1/api/v1/query?query=up | jq .
- 9.
` - 10.Check for query syntax errors in the alert rule:
- 11.- Navigate to Alerting > Alert rules
- 12.- Click on the failing rule
- 13.- Check "Preview" for query errors
- 14.- Ensure query returns expected format (instant vs range)
- 15.Fix "no data" handling:
- 16.- In alert rule, set "No data" behavior:
- 17.- "No Data" = No Data (keeps current state)
- 18.- "No Data" = Alerting (fires alert)
- 19.- "No Data" = OK (resolves alert)
Contact Point Issues
- 1.Verify contact point configuration:
- 2.```bash
- 3.curl -s -u admin:password http://localhost:3000/api/v1/provisioning/contact-points | jq .
- 4.
` - 5.Test contact point delivery:
- 6.- Navigate to Alerting > Contact points
- 7.- Click "Test" on the contact point
- 8.- Verify test notification is received
- 9.For Slack contact points, verify webhook URL:
- 10.```bash
- 11.curl -X POST -H 'Content-type: application/json' \
- 12.--data '{"text":"Test alert from Grafana"}' \
- 13.https://hooks.slack.com/services/YOUR/WEBHOOK/URL
- 14.
` - 15.For email contact points, check SMTP configuration:
- 16.```ini
- 17.# In grafana.ini
- 18.[smtp]
- 19.enabled = true
- 20.host = smtp.example.com:587
- 21.user = noreply@example.com
- 22.password = smtp-password
- 23.from_address = grafana@example.com
- 24.
` - 25.For PagerDuty contact points, verify routing key:
- 26.```bash
- 27.curl -X POST https://events.pagerduty.com/v2/enqueue \
- 28.-H 'Content-Type: application/json' \
- 29.-d '{
- 30."routing_key": "your-routing-key",
- 31."event_action": "trigger",
- 32."dedup_key": "test-alert",
- 33."payload": {"summary": "Test alert", "severity": "critical"}
- 34.}'
- 35.
`
Alert Rule Evaluation Issues
- 1.Increase evaluation timeout for complex queries:
- 2.```ini
- 3.# In grafana.ini
- 4.[unified_alerting]
- 5.evaluation_timeout = 30s
- 6.max_attempts = 3
- 7.
` - 8.Check alert rule evaluation logs:
- 9.```bash
- 10.journalctl -u grafana-server -f | grep -i "alert|eval"
- 11.
` - 12.Verify rule evaluation frequency is appropriate:
- 13.- Rule group interval should match data resolution
- 14.- Evaluation interval should be >= query range
- 15.- For 1-minute resolution data, use 1-minute evaluation
Alert Manager Issues
- 1.Check alertmanager configuration:
- 2.```bash
- 3.curl -s -u admin:password http://localhost:3000/api/v1/provisioning/policies | jq .
- 4.
` - 5.Verify no silence rules are blocking:
- 6.- Navigate to Alerting > Silences
- 7.- Check for active silences
- 8.- Delete or expire any unintended silences
- 9.Check for inhibition rules that might suppress notifications:
- 10.```bash
- 11.curl -s -u admin:password http://localhost:3000/api/v1/provisioning/policies | jq '.inhibitRules'
- 12.
`
Provisioning Issues
- 1.Validate alert provisioning files:
- 2.```bash
- 3.# Check for syntax errors
- 4.grafana-cli admin data-migration alerting-migration-check
- 5.
` - 6.For YAML-provisioned alerts, verify file syntax:
- 7.```yaml
- 8.# Example valid alert rule
- 9.apiVersion: 1
- 10.groups:
- 11.- orgId: 1
- 12.name: system-alerts
- 13.rules:
- 14.- uid: alert-1
- 15.title: High CPU Usage
- 16.condition: C
- 17.data:
- 18.- refId: A
- 19.relativeTimeRange:
- 20.from: 600
- 21.to: 0
- 22.datasourceUid: prometheus
- 23.model:
- 24.expr: rate(cpu_usage[5m]) > 0.8
- 25.refId: A
- 26.
`
Verification
- 1.Verify alert evaluation:
- 2.```bash
- 3.curl -s -u admin:password http://localhost:3000/api/v1/rules | jq '.[] | {name: .name, state: .state}'
- 4.
` - 5.Check alert history:
- 6.- Navigate to Alerting > Alert history
- 7.- Verify evaluations are occurring
- 8.- Check for patterns in failures
- 9.Confirm notifications are delivered:
- 10.- Trigger a test alert
- 11.- Verify notification appears in contact point
- 12.- Check contact point response logs