Fix Grafana Alerting Error - Alert Rule Troubleshooting

Introduction

Grafana's alerting system notifies teams when metrics cross defined thresholds. When alerting fails, you may see errors like "failed to evaluate alert rule," "notification failed," or alerts that never fire despite conditions being met. These issues can be caused by datasource problems, query errors, notification channel misconfiguration, or alert rule conflicts.

Symptoms

Alert rules show "Error" or "Unknown" state instead of Normal/Alerting
Error: "failed to evaluate queries" in alert rule status
Notifications are not sent when alerts fire
Error: "contact point failed" or "notification delivery failed"
Alerts fire but immediately resolve, creating notification spam
Alert history shows no evaluations despite conditions being met

Common Causes

Datasource query returns no data or invalid format
Alert rule query references deleted or renamed metrics
Contact point (notification channel) has invalid configuration
Alert rule evaluation timeout is too short for complex queries
Datasource is unreachable during alert evaluation
Silence or inhibition rules are blocking notifications
Alert manager configuration is invalid

Step-by-Step Fix

Datasource Query Issues

1.Test the alert query directly in Explore:
2.```bash
3.# Navigate to Explore > Run the exact query from alert rule
4.# Check for errors or empty results
5.`
6.Verify the datasource is healthy:
7.```bash
8.curl -s http://localhost:3000/api/datasources/proxy/1/api/v1/query?query=up | jq .
9.`
10.Check for query syntax errors in the alert rule:
11.- Navigate to Alerting > Alert rules
12.- Click on the failing rule
13.- Check "Preview" for query errors
14.- Ensure query returns expected format (instant vs range)
15.Fix "no data" handling:
16.- In alert rule, set "No data" behavior:
17.- "No Data" = No Data (keeps current state)
18.- "No Data" = Alerting (fires alert)
19.- "No Data" = OK (resolves alert)

Contact Point Issues

1.Verify contact point configuration:
2.```bash
3.curl -s -u admin:password http://localhost:3000/api/v1/provisioning/contact-points | jq .
4.`
5.Test contact point delivery:
6.- Navigate to Alerting > Contact points
7.- Click "Test" on the contact point
8.- Verify test notification is received
9.For Slack contact points, verify webhook URL:
10.```bash
11.curl -X POST -H 'Content-type: application/json' \
12.--data '{"text":"Test alert from Grafana"}' \
13.https://hooks.slack.com/services/YOUR/WEBHOOK/URL
14.`
15.For email contact points, check SMTP configuration:
16.```ini
17.# In grafana.ini
18.[smtp]
19.enabled = true
20.host = smtp.example.com:587
21.user = noreply@example.com
22.password = smtp-password
23.from_address = grafana@example.com
24.`
25.For PagerDuty contact points, verify routing key:
26.```bash
27.curl -X POST https://events.pagerduty.com/v2/enqueue \
28.-H 'Content-Type: application/json' \
29.-d '{
30."routing_key": "your-routing-key",
31."event_action": "trigger",
32."dedup_key": "test-alert",
33."payload": {"summary": "Test alert", "severity": "critical"}
34.}'
35.`

Alert Rule Evaluation Issues

1.Increase evaluation timeout for complex queries:
2.```ini
3.# In grafana.ini
4.[unified_alerting]
5.evaluation_timeout = 30s
6.max_attempts = 3
7.`
8.Check alert rule evaluation logs:
9.```bash
10.journalctl -u grafana-server -f | grep -i "alert|eval"
11.`
12.Verify rule evaluation frequency is appropriate:
13.- Rule group interval should match data resolution
14.- Evaluation interval should be >= query range
15.- For 1-minute resolution data, use 1-minute evaluation

Alert Manager Issues

1.Check alertmanager configuration:
2.```bash
3.curl -s -u admin:password http://localhost:3000/api/v1/provisioning/policies | jq .
4.`
5.Verify no silence rules are blocking:
6.- Navigate to Alerting > Silences
7.- Check for active silences
8.- Delete or expire any unintended silences
9.Check for inhibition rules that might suppress notifications:
10.```bash
11.curl -s -u admin:password http://localhost:3000/api/v1/provisioning/policies | jq '.inhibitRules'
12.`

Provisioning Issues

1.Validate alert provisioning files:
2.```bash
3.# Check for syntax errors
4.grafana-cli admin data-migration alerting-migration-check
5.`
6.For YAML-provisioned alerts, verify file syntax:
7.```yaml
8.# Example valid alert rule
9.apiVersion: 1
10.groups:
11.- orgId: 1
12.name: system-alerts
13.rules:
14.- uid: alert-1
15.title: High CPU Usage
16.condition: C
17.data:
18.- refId: A
19.relativeTimeRange:
20.from: 600
21.to: 0
22.datasourceUid: prometheus
23.model:
24.expr: rate(cpu_usage[5m]) > 0.8
25.refId: A
26.`

Verification

1.Verify alert evaluation:
2.```bash
3.curl -s -u admin:password http://localhost:3000/api/v1/rules | jq '.[] | {name: .name, state: .state}'
4.`
5.Check alert history:
6.- Navigate to Alerting > Alert history
7.- Verify evaluations are occurring
8.- Check for patterns in failures
9.Confirm notifications are delivered:
10.- Trigger a test alert
11.- Verify notification appears in contact point
12.- Check contact point response logs

How to Fix Grafana Alerting Error

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Datasource Query Issues

Contact Point Issues

Alert Rule Evaluation Issues

Alert Manager Issues

Provisioning Issues

Verification

Share this guide

More Monitoring Troubleshooting Guides

Metric Retention Expired

Timeseries Storage Full

Collector Agent Crashed

Webhook Notification Timeout

SMS Notification Failed

Email Notification Bounced