The Problem
You have two Prometheus instances running in HA mode, but you're experiencing issues:
- Duplicate alerts firing from both instances
- Inconsistent data between the two Prometheus servers
- Alertmanager receiving alerts from both without deduplication
level=warn ts=2026-04-04T23:55:12.345Z caller=alerting.go:234 msg="Alert already exists" alert="HighCPU" instance="prometheus-1"
level=error ts=2026-04-04T23:55:13.456Z caller=alerting.go:235 msg="Duplicate alert source" sources="prometheus-0,prometheus-1"HA pair issues cause alert noise, data gaps, and unreliable monitoring.
Diagnosis
Check External Labels
# Check external labels on each Prometheus
curl -s http://prometheus-0:9090/api/v1/status/config | jq '.data.global.external_labels'
curl -s http://prometheus-1:9090/api/v1/status/config | jq '.data.global.external_labels'Check Alertmanager Connections
```bash # Verify both Prometheus instances are connected to Alertmanager curl -s http://alertmanager:9093/api/v2/status | jq '.data'
# Check alert silences and inhibition rules curl -s http://alertmanager:9093/api/v2/silences | jq . ```
Check for Duplicate Alerts
```promql # Count alerts from each Prometheus count by (prometheus) (ALERTS{alertstate="firing"})
# Alerts without replica label ALERTS{alertstate="firing"} unless ALERTS{alertstate="firing",prometheus=~".+"} ```
Check Data Consistency
```promql # Compare metrics from both Prometheus # Query prometheus-0 directly {job="node-exporter"} @ prometheus-0
# Compare timestamps timestamp(up{job="node-exporter"}) @ prometheus-0 == timestamp(up{job="node-exporter"}) @ prometheus-1 ```
Solutions
1. Configure External Labels
Each Prometheus instance must have unique external labels:
```yaml # prometheus-0 configuration # prometheus.yml global: external_labels: prometheus: 'prometheus-0' cluster: 'production' replica: '0'
# prometheus-1 configuration global: external_labels: prometheus: 'prometheus-1' cluster: 'production' replica: '1' ```
These labels are used by Alertmanager to deduplicate alerts.
2. Configure Alertmanager for Deduplication
Alertmanager uses external labels for deduplication:
```yaml # alertmanager.yml global: # Resolve timeout resolve_timeout: 5m
route: group_by: ['alertname', 'cluster', 'prometheus'] group_wait: 30s group_interval: 5m repeat_interval: 1h receiver: 'default'
receivers: - name: 'default' webhook_configs: - url: 'http://notification-service/webhook'
# Inhibition rules for deduplication inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'cluster'] ```
The group_by must include the unique replica label.
3. Deduplicate via Thanos/VictoriaMetrics
For long-term storage deduplication:
# Thanos Receive configuration
receive:
# Enable deduplication
dedup_enabled: true
replica_label: 'prometheus'
# Hash ring configuration
hashring:
members:
- address: thanos-receive-0
- address: thanos-receive-1Or use Victoria Metrics:
# Victoria Metrics deduplication settings
vminsert -dedup.minScrapeInterval=15sConfigure Prometheus remote write:
# Both Prometheus instances
remote_write:
- url: "https://thanos-receive:19291/api/v1/write"
# Ensure external_labels are set globally4. Fix Scrape Configuration Differences
Both Prometheus should have identical scrape configs:
```bash # Compare configurations diff prometheus-0.yml prometheus-1.yml
# Or via API curl -s http://prometheus-0:9090/api/v1/status/config > config-0.json curl -s http://prometheus-1:9090/api/v1/status/config > config-1.json diff config-0.json config-1.json ```
Ensure identical configs:
```yaml # Shared configuration file for both instances # prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s
# Only difference should be external_labels # Use separate files for external labels or environment variables ```
Using environment variables:
```yaml # prometheus.yml global: external_labels: prometheus: '${PROMETHEUS_REPLICA}' cluster: 'production'
# Set via command line or environment export PROMETHEUS_REPLICA=prometheus-0 prometheus --config.file=prometheus.yml ```
5. Handle Alert Evaluation Timing
Alerts may fire at different times due to timing differences:
```yaml # Sync evaluation intervals global: evaluation_interval: 30s # Same on both
# Use consistent 'for' durations groups: - name: application_alerts rules: - alert: HighCPU expr: rate(process_cpu_seconds_total[1m]) > 0.8 for: 5m # Should be > scrape_interval ```
6. Configure Kubernetes HA
For Kubernetes deployments:
# prometheus-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
spec:
replicas: 2
serviceName: prometheus
template:
spec:
containers:
- name: prometheus
image: prom/prometheus:latest
args:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/data'
- '--external.label.prometheus=prometheus-$(POD_NAME)'
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.nameService configuration:
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
type: ClusterIP
ports:
- port: 9090
selector:
app: prometheus
---
# Headless service for StatefulSet
apiVersion: v1
kind: Service
metadata:
name: prometheus-headless
spec:
type: ClusterIP
clusterIP: None
ports:
- port: 9090
selector:
app: prometheus7. Alertmanager HA Configuration
For Alertmanager HA:
```yaml # alertmanager.yml for cluster mode cluster: peers: - alertmanager-0:9094 - alertmanager-1:9094 gossip_interval: 10s peer_timeout: 30s
high_availability: enabled: true ```
Kubernetes deployment:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: alertmanager
spec:
replicas: 2
serviceName: alertmanager
template:
spec:
containers:
- name: alertmanager
image: prom/alertmanager:latest
args:
- '--cluster.peer=alertmanager-1:9094'
- '--cluster.peer=alertmanager-0:9094'
ports:
- containerPort: 9093
- containerPort: 9094Verification
Check Alert Deduplication
```bash # Verify alerts from both sources curl -s http://alertmanager:9093/api/v2/alerts | jq '.[] | {labels: .labels, fingerprint: .fingerprint}'
# Check Alertmanager cluster status curl -s http://alertmanager:9093/api/v2/status | jq '.cluster' ```
Verify External Labels
```promql # Check both Prometheus have different replica labels count by (prometheus) ({__name__=~"prometheus_.+"})
# Query from each Prometheus {prometheus="prometheus-0"} {prometheus="prometheus-1"} ```
Check Data Consistency
# Compare sample counts
curl -s 'http://prometheus-0:9090/api/v1/query?query=count({__name__=~".+"})' | jq '.data.result[0].value'
curl -s 'http://prometheus-1:9090/api/v1/query?query=count({__name__=~".+"})' | jq '.data.result[0].value'Prevention
Add monitoring for HA pair:
```yaml groups: - name: ha_pair_alerts rules: - alert: PrometheusReplicaMissingExternalLabel expr: absent({prometheus=~".+"}) for: 5m labels: severity: critical annotations: summary: "Prometheus replica missing external label" description: "Prometheus is missing the 'prometheus' external label required for HA"
- alert: AlertmanagerHADown
- expr: count(alertmanager_cluster_members) != count(alertmanager_cluster_members_info)
- for: 5m
- labels:
- severity: critical
- annotations:
- summary: "Alertmanager cluster degraded"
- description: "Expected {{ $value }} Alertmanager members but fewer are healthy"
- alert: PrometheusHAConfigMismatch
- expr: |
- count by (job) ({__name__=~".+"}) @ prometheus-0 !=
- count by (job) ({__name__=~".+"}) @ prometheus-1
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "Prometheus HA configuration mismatch"
- alert: DuplicateAlertSources
- expr: count by (alertname) (ALERTS{alertstate="firing"}) > 1
- for: 1m
- labels:
- severity: warning
- annotations:
- summary: "Duplicate alerts detected"
- description: "Alert {{ $labels.alertname }} firing from multiple sources"
`