Introduction

Grafana's alerting system notifies teams when metrics cross defined thresholds. When alerting fails, you may see errors like "failed to evaluate alert rule," "notification failed," or alerts that never fire despite conditions being met. These issues can be caused by datasource problems, query errors, notification channel misconfiguration, or alert rule conflicts.

Symptoms

  • Alert rules show "Error" or "Unknown" state instead of Normal/Alerting
  • Error: "failed to evaluate queries" in alert rule status
  • Notifications are not sent when alerts fire
  • Error: "contact point failed" or "notification delivery failed"
  • Alerts fire but immediately resolve, creating notification spam
  • Alert history shows no evaluations despite conditions being met

Common Causes

  • Datasource query returns no data or invalid format
  • Alert rule query references deleted or renamed metrics
  • Contact point (notification channel) has invalid configuration
  • Alert rule evaluation timeout is too short for complex queries
  • Datasource is unreachable during alert evaluation
  • Silence or inhibition rules are blocking notifications
  • Alert manager configuration is invalid

Step-by-Step Fix

Datasource Query Issues

  1. 1.Test the alert query directly in Explore:
  2. 2.```bash
  3. 3.# Navigate to Explore > Run the exact query from alert rule
  4. 4.# Check for errors or empty results
  5. 5.`
  6. 6.Verify the datasource is healthy:
  7. 7.```bash
  8. 8.curl -s http://localhost:3000/api/datasources/proxy/1/api/v1/query?query=up | jq .
  9. 9.`
  10. 10.Check for query syntax errors in the alert rule:
  11. 11.- Navigate to Alerting > Alert rules
  12. 12.- Click on the failing rule
  13. 13.- Check "Preview" for query errors
  14. 14.- Ensure query returns expected format (instant vs range)
  15. 15.Fix "no data" handling:
  16. 16.- In alert rule, set "No data" behavior:
  17. 17.- "No Data" = No Data (keeps current state)
  18. 18.- "No Data" = Alerting (fires alert)
  19. 19.- "No Data" = OK (resolves alert)

Contact Point Issues

  1. 1.Verify contact point configuration:
  2. 2.```bash
  3. 3.curl -s -u admin:password http://localhost:3000/api/v1/provisioning/contact-points | jq .
  4. 4.`
  5. 5.Test contact point delivery:
  6. 6.- Navigate to Alerting > Contact points
  7. 7.- Click "Test" on the contact point
  8. 8.- Verify test notification is received
  9. 9.For Slack contact points, verify webhook URL:
  10. 10.```bash
  11. 11.curl -X POST -H 'Content-type: application/json' \
  12. 12.--data '{"text":"Test alert from Grafana"}' \
  13. 13.https://hooks.slack.com/services/YOUR/WEBHOOK/URL
  14. 14.`
  15. 15.For email contact points, check SMTP configuration:
  16. 16.```ini
  17. 17.# In grafana.ini
  18. 18.[smtp]
  19. 19.enabled = true
  20. 20.host = smtp.example.com:587
  21. 21.user = noreply@example.com
  22. 22.password = smtp-password
  23. 23.from_address = grafana@example.com
  24. 24.`
  25. 25.For PagerDuty contact points, verify routing key:
  26. 26.```bash
  27. 27.curl -X POST https://events.pagerduty.com/v2/enqueue \
  28. 28.-H 'Content-Type: application/json' \
  29. 29.-d '{
  30. 30."routing_key": "your-routing-key",
  31. 31."event_action": "trigger",
  32. 32."dedup_key": "test-alert",
  33. 33."payload": {"summary": "Test alert", "severity": "critical"}
  34. 34.}'
  35. 35.`

Alert Rule Evaluation Issues

  1. 1.Increase evaluation timeout for complex queries:
  2. 2.```ini
  3. 3.# In grafana.ini
  4. 4.[unified_alerting]
  5. 5.evaluation_timeout = 30s
  6. 6.max_attempts = 3
  7. 7.`
  8. 8.Check alert rule evaluation logs:
  9. 9.```bash
  10. 10.journalctl -u grafana-server -f | grep -i "alert|eval"
  11. 11.`
  12. 12.Verify rule evaluation frequency is appropriate:
  13. 13.- Rule group interval should match data resolution
  14. 14.- Evaluation interval should be >= query range
  15. 15.- For 1-minute resolution data, use 1-minute evaluation

Alert Manager Issues

  1. 1.Check alertmanager configuration:
  2. 2.```bash
  3. 3.curl -s -u admin:password http://localhost:3000/api/v1/provisioning/policies | jq .
  4. 4.`
  5. 5.Verify no silence rules are blocking:
  6. 6.- Navigate to Alerting > Silences
  7. 7.- Check for active silences
  8. 8.- Delete or expire any unintended silences
  9. 9.Check for inhibition rules that might suppress notifications:
  10. 10.```bash
  11. 11.curl -s -u admin:password http://localhost:3000/api/v1/provisioning/policies | jq '.inhibitRules'
  12. 12.`

Provisioning Issues

  1. 1.Validate alert provisioning files:
  2. 2.```bash
  3. 3.# Check for syntax errors
  4. 4.grafana-cli admin data-migration alerting-migration-check
  5. 5.`
  6. 6.For YAML-provisioned alerts, verify file syntax:
  7. 7.```yaml
  8. 8.# Example valid alert rule
  9. 9.apiVersion: 1
  10. 10.groups:
  11. 11.- orgId: 1
  12. 12.name: system-alerts
  13. 13.rules:
  14. 14.- uid: alert-1
  15. 15.title: High CPU Usage
  16. 16.condition: C
  17. 17.data:
  18. 18.- refId: A
  19. 19.relativeTimeRange:
  20. 20.from: 600
  21. 21.to: 0
  22. 22.datasourceUid: prometheus
  23. 23.model:
  24. 24.expr: rate(cpu_usage[5m]) > 0.8
  25. 25.refId: A
  26. 26.`

Verification

  1. 1.Verify alert evaluation:
  2. 2.```bash
  3. 3.curl -s -u admin:password http://localhost:3000/api/v1/rules | jq '.[] | {name: .name, state: .state}'
  4. 4.`
  5. 5.Check alert history:
  6. 6.- Navigate to Alerting > Alert history
  7. 7.- Verify evaluations are occurring
  8. 8.- Check for patterns in failures
  9. 9.Confirm notifications are delivered:
  10. 10.- Trigger a test alert
  11. 11.- Verify notification appears in contact point
  12. 12.- Check contact point response logs