Introduction False positive alerts from synthetic monitoring erode trust in the alerting system and cause alert fatigue. When checks fail due to transient issues rather than real problems, engineers start ignoring alerts.

Symptoms - Alerts firing and resolving within minutes - Alert correlates with known network blips - Same check fails from one location but passes from others - HTTP check fails with timeout but service is healthy - Alert rate too high for the actual incident rate

Common Causes - Check timeout too aggressive - Single location check hitting network issues - DNS resolution intermittent - SSL certificate check with wrong validation - Check not accounting for CDN cache miss latency

Step-by-Step Fix 1. **Adjust check timeout and retries': Increase timeout from 10s to 30s and add retry count.

  1. 1.**Use multiple check locations':
  2. 2.Configure checks from at least 3 geographic locations. Alert only when 2+ locations fail.
  3. 3.**Add alert deduplication and cooldown':
  4. 4.Configure alert cooldown period (e.g., 15 minutes) before re-firing.
  5. 5.**Review check configuration':
  6. 6.```yaml
  7. 7.# Example synthetic check config
  8. 8.http:
  9. 9.url: https://api.example.com/health
  10. 10.method: GET
  11. 11.timeout: 30s
  12. 12.retries: 2
  13. 13.locations:
  14. 14.- us-east-1
  15. 15.- eu-west-1
  16. 16.- ap-southeast-1
  17. 17.assertion:
  18. 18.- type: statusCode
  19. 19.value: 200
  20. 20.- type: responseTime
  21. 21.value: 5000 # 5 seconds
  22. 22.`

Prevention - Set timeout to 3x the p99 response time - Use multiple check locations for geographic redundancy - Implement alert cooldown and deduplication - Regularly review and tune alert thresholds - Track false positive rate as a monitoring metric