Your Grafana dashboards are showing "Datasource connection failed" or "No data" errors, and you're losing visibility into your systems. This is a common issue that can stem from network problems, authentication failures, or misconfiguration. Let's walk through the systematic approach to diagnose and fix these problems.
Understanding the Error
Grafana datasource errors typically appear in several ways:
In the UI:
``
Datasource connection failed
Error querying datasource: bad gatewayFailed to call resourceIn Grafana logs:
``
logger=tsdb.prometheus t=2024-01-15T10:23:45.123Z level=error msg="Failed to query datasource" err="Post \"http://prometheus:9090/api/v1/query\": dial tcp: lookup prometheus: no such host"
logger=sqlstore t=2024-01-15T10:23:45.123Z level=error msg="Failed to connect to database" err="dial tcp 10.0.0.5:3306: connect: connection refused"Initial Diagnosis
Start by checking the datasource configuration and testing the connection:
```bash # Get current datasource configuration via API curl -s http://admin:password@localhost:3000/api/datasources | jq '.[] | {name: .name, type: .type, url: .url}'
# Test a specific datasource curl -s http://admin:password@localhost:3000/api/datasources/1/health | jq '.'
# Or use Grafana CLI grafana-cli admin data-migration check ```
Check Grafana logs for connection errors:
```bash # For systemd installations journalctl -u grafana-server -f | grep -i "datasource|connection|error"
# For Docker/Kubernetes kubectl logs -l app=grafana -n monitoring -f | grep -i "datasource|connection"
# Check the main Grafana log file tail -f /var/log/grafana/grafana.log | grep -i "datasource|error" ```
Common Cause 1: Network Connectivity
The most common cause is that Grafana cannot reach the datasource host.
Diagnosis:
```bash # Test connectivity from Grafana server to datasource curl -v http://prometheus-server:9090/api/v1/query?query=up
# For Kubernetes environments, test from inside the Grafana pod kubectl exec -it grafana-0 -n monitoring -- sh curl http://prometheus:9090/-/healthy
# Check DNS resolution nslookup prometheus-server dig prometheus-server +short
# Check if port is open nc -zv prometheus-server 9090
# Test with the exact URL from datasource config curl -v http://prometheus-server.monitoring.svc.cluster.local:9090/-/healthy ```
Solution:
Fix the network path or update the datasource URL:
# For Kubernetes, ensure proper service discovery
# Update datasource URL to use the service name
curl -X PATCH http://admin:password@localhost:3000/api/datasources/1 \
-H "Content-Type: application/json" \
-d '{
"url": "http://prometheus-server.monitoring.svc.cluster.local:9090"
}'For firewall issues:
```bash # Check firewall rules iptables -L -n | grep 9090
# For firewalld firewall-cmd --list-all
# Allow traffic if needed firewall-cmd --add-port=9090/tcp --permanent firewall-cmd --reload ```
Common Cause 2: Authentication Failures
Many datasources require authentication, and incorrect credentials will cause connection failures.
Error patterns:
``
Error 401: Unauthorized
Error 403: ForbiddenDiagnosis:
```bash # Test datasource with authentication curl -u username:password http://datasource-host:9090/api/v1/query?query=up
# Test with basic auth header curl -H "Authorization: Basic $(echo -n 'user:password' | base64)" \ http://datasource-host:9090/api/v1/query?query=up
# Test with bearer token curl -H "Authorization: Bearer your-token" \ http://datasource-host:9090/api/v1/query?query=up
# For databases, test connection directly mysql -h mysql-host -u grafana -p -e "SELECT 1" psql -h postgres-host -U grafana -d grafana -c "SELECT 1" ```
Solution:
Update datasource configuration with correct credentials:
# Update via API
curl -X PUT http://admin:password@localhost:3000/api/datasources/1 \
-H "Content-Type: application/json" \
-d '{
"name": "Prometheus",
"type": "prometheus",
"url": "http://prometheus:9090",
"access": "proxy",
"basicAuth": true,
"basicAuthUser": "admin",
"secureJsonData": {
"basicAuthPassword": "newpassword"
}
}'For database datasources:
curl -X PUT http://admin:password@localhost:3000/api/datasources/1 \
-H "Content-Type: application/json" \
-d '{
"name": "PostgreSQL",
"type": "postgres",
"url": "postgres:5432",
"database": "grafana",
"user": "grafana",
"secureJsonData": {
"password": "securepassword"
},
"jsonData": {
"sslmode": "disable"
}
}'Common Cause 3: TLS/SSL Certificate Issues
When datasources use HTTPS, certificate problems can prevent connections.
Error patterns:
``
x509: certificate signed by unknown authority
x509: certificate has expired or is not yet validDiagnosis:
```bash # Check certificate validity echo | openssl s_client -connect datasource-host:443 -servername datasource-host 2>/dev/null | openssl x509 -noout -dates
# Check certificate chain openssl s_client -connect datasource-host:443 -servername datasource-host -showcerts
# Test connection skipping TLS verification curl -k https://datasource-host/metrics ```
Solution:
Option 1: Add custom CA certificate to Grafana:
```bash # Add CA to Grafana's trusted certificates cp /path/to/ca.crt /etc/grafana/ca.crt
# Update Grafana configuration cat >> /etc/grafana/grafana.ini << EOF [server] protocol = https cert_file = /etc/grafana/server.crt cert_key = /etc/grafana/server.key
[security] tls_skip_verify_insecure = false EOF
# Restart Grafana systemctl restart grafana-server ```
Option 2: Configure datasource to skip TLS verification (not recommended for production):
curl -X PATCH http://admin:password@localhost:3000/api/datasources/1 \
-H "Content-Type: application/json" \
-d '{
"jsonData": {
"tlsSkipVerify": true
}
}'Option 3: Add custom CA to datasource:
curl -X PUT http://admin:password@localhost:3000/api/datasources/1 \
-H "Content-Type: application/json" \
-d '{
"name": "SecurePrometheus",
"type": "prometheus",
"url": "https://prometheus:9090",
"jsonData": {
"tlsAuth": true,
"tlsAuthWithCACert": true
},
"secureJsonData": {
"tlsCACert": "-----BEGIN CERTIFICATE-----\nMIID...\n-----END CERTIFICATE-----"
}
}'Common Cause 4: Datasource Service Issues
Sometimes the datasource itself is not running or is unhealthy.
Diagnosis:
```bash # Check if Prometheus is running curl http://prometheus:9090/-/healthy curl http://prometheus:9090/-/ready
# Check Prometheus status systemctl status prometheus
# For Kubernetes kubectl get pods -l app=prometheus -n monitoring kubectl logs -l app=prometheus -n monitoring --tail=50
# Check database connectivity kubectl exec -it postgres-0 -- pg_isready
# Check Elasticsearch health curl http://elasticsearch:9200/_cluster/health ```
Solution:
Fix the datasource service:
```bash # Restart Prometheus if it's down kubectl rollout restart deployment/prometheus-server -n monitoring
# Check for resource constraints kubectl describe pod prometheus-server-0 -n monitoring
# Check events kubectl get events -n monitoring --sort-by='.lastTimestamp' ```
Common Cause 5: Proxy and Access Mode Issues
Grafana has two access modes: server (proxy) and browser (direct). The wrong setting can cause failures.
Server (Proxy) mode: Grafana server makes the request to the datasource. Browser (Direct) mode: User's browser makes the request directly.
Diagnosis:
```bash # Check current access mode curl -s http://admin:password@localhost:3000/api/datasources | jq '.[] | {name: .name, access: .access}'
# For browser/direct mode, test from your local machine curl http://datasource-host:9090/api/v1/query?query=up ```
Solution:
Update access mode based on your network topology:
```bash # Set to server/proxy mode (most common) curl -X PATCH http://admin:password@localhost:3000/api/datasources/1 \ -H "Content-Type: application/json" \ -d '{ "access": "proxy" }'
# Or browser/direct mode (requires datasource to be accessible from user's browser) curl -X PATCH http://admin:password@localhost:3000/api/datasources/1 \ -H "Content-Type: application/json" \ -d '{ "access": "direct" }' ```
Common Cause 6: Timeout Issues
Large queries or slow datasources can cause timeouts.
Error pattern:
``
context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Solution:
Increase timeout settings:
curl -X PATCH http://admin:password@localhost:3000/api/datasources/1 \
-H "Content-Type: application/json" \
-d '{
"jsonData": {
"timeout": "60",
"httpMethod": "POST"
}
}'Or in grafana.ini:
```ini [database] query_cache_lifetime = 30s
[dataproxy] timeout = 60 dialTimeout = 30 ```
Common Cause 7: Resource Limits
Grafana or the datasource might be resource-constrained.
Diagnosis:
```bash # Check Grafana resource usage curl http://localhost:3000/api/admin/stats | jq '.'
# For Kubernetes kubectl top pods -n monitoring kubectl describe pod grafana-0 -n monitoring
# Check Grafana configuration cat /etc/grafana/grafana.ini | grep -A 10 "[database]" ```
Solution:
# Increase resources for Grafana
resources:
limits:
cpu: "2"
memory: "2Gi"
requests:
cpu: "500m"
memory: "512Mi"Verification
After making changes, verify the datasource is working:
```bash # Test datasource health curl -s http://admin:password@localhost:3000/api/datasources/1/health | jq '.'
# Run a test query curl -s http://admin:password@localhost:3000/api/datasources/proxy/1/api/v1/query?query=up | jq '.'
# Check in Grafana UI # Navigate to Configuration > Data Sources > [Your Datasource] > Test ```
Prevention
Set up monitoring for datasource health:
# Add to Prometheus alerting rules
groups:
- name: grafana_health
rules:
- alert: GrafanaDatasourceDown
expr: grafana_datasource_request_total{status="error"} > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Grafana datasource {{ $labels.datasource }} is failing"Regular health checks:
# Add to a cron job or monitoring system
#!/bin/bash
DATASOURCES=$(curl -s http://admin:password@localhost:3000/api/datasources | jq -r '.[].id')
for ID in $DATASOURCES; do
HEALTH=$(curl -s http://admin:password@localhost:3000/api/datasources/$ID/health | jq -r '.status')
if [ "$HEALTH" != "OK" ]; then
echo "Datasource $ID is unhealthy: $HEALTH"
# Send alert
fi
doneThe key to resolving datasource connection issues is to test connectivity at each layer: network, authentication, TLS, and service health. Start with the simplest tests and work your way through the stack.