Introduction

APIs deliver webhook events to registered endpoint URLs using HTTP POST requests with retry logic. When the endpoint consistently times out -- due to being down, slow, or returning errors -- the webhook system exhausts its retry attempts and marks the delivery as permanently failed. This means critical events (payment confirmations, deployment notifications, security alerts) are never delivered to the intended recipient.

Symptoms

  • Webhook dashboard shows events with failed status and max retries exceeded
  • Receiving endpoint never gets the webhook POST request
  • Webhook delivery logs show repeated timeout errors
  • Events are not redelivered after the endpoint recovers
  • Error message: Webhook delivery failed after 5 retries: timeout after 30s

Common Causes

  • Webhook endpoint server is down or unreachable
  • Endpoint response time exceeds the webhook delivery timeout
  • Firewall or network policy blocking webhook POST requests
  • Endpoint returning 5xx errors for every webhook delivery
  • SSL/TLS certificate expired on the webhook endpoint

Step-by-Step Fix

  1. 1.Check the webhook endpoint health: Verify the endpoint is reachable.
  2. 2.```bash
  3. 3.curl -v -X POST https://webhook-endpoint.example.com/webhook \
  4. 4.-H "Content-Type: application/json" \
  5. 5.-d '{"test": true}'
  6. 6.# Check response time and status code
  7. 7.`
  8. 8.Manually replay failed webhook events: Redeliver lost events.
  9. 9.```bash
  10. 10.# Via API (example for Stripe-like webhook system)
  11. 11.curl -X POST https://api.example.com/webhooks/wh_abc123/redeliver \
  12. 12.-H "Authorization: Bearer $API_KEY"
  13. 13.# Or replay all failed events
  14. 14.curl -X POST https://api.example.com/webhooks/redeliver-all-failed \
  15. 15.-H "Authorization: Bearer $API_KEY"
  16. 16.`
  17. 17.Increase the webhook delivery timeout and retry count: Give the endpoint more time.
  18. 18.```yaml
  19. 19.webhook_config:
  20. 20.timeout: 60s # Increased from 30s
  21. 21.max_retries: 10 # Increased from 5
  22. 22.retry_backoff: exponential
  23. 23.initial_delay: 1s
  24. 24.max_delay: 300s
  25. 25.`
  26. 26.Fix the endpoint to respond promptly: Optimize the webhook handler.
  27. 27.```python
  28. 28.# Fast webhook handler - acknowledge immediately, process asynchronously
  29. 29.@app.post('/webhook')
  30. 30.async def handle_webhook(request: Request):
  31. 31.# Acknowledge immediately (within timeout)
  32. 32.event = await request.json()
  33. 33.# Queue for async processing
  34. 34.await task_queue.enqueue('process_webhook', event)
  35. 35.return JSONResponse(status_code=200, content={'received': True})
  36. 36.`
  37. 37.Verify webhook delivery succeeds: Test end-to-end.
  38. 38.```bash
  39. 39.# Trigger a test webhook event
  40. 40.curl -X POST https://api.example.com/webhooks/test \
  41. 41.-H "Authorization: Bearer $API_KEY"
  42. 42.# Check the webhook endpoint received it
  43. 43.# Check the webhook dashboard shows 'delivered' status
  44. 44.`

Prevention

  • Design webhook endpoints to acknowledge receipt within 5 seconds and process asynchronously
  • Monitor webhook delivery success rates and alert on failure spikes
  • Implement webhook endpoint health checks that run before delivery attempts
  • Provide a webhook event log API that consumers can poll as a delivery fallback
  • Configure webhook delivery with exponential backoff retry schedules
  • Offer multiple webhook endpoint URLs for redundancy (primary and failover)