Introduction
APIs deliver webhook events to registered endpoint URLs using HTTP POST requests with retry logic. When the endpoint consistently times out -- due to being down, slow, or returning errors -- the webhook system exhausts its retry attempts and marks the delivery as permanently failed. This means critical events (payment confirmations, deployment notifications, security alerts) are never delivered to the intended recipient.
Symptoms
- Webhook dashboard shows events with
failedstatus andmax retries exceeded - Receiving endpoint never gets the webhook POST request
- Webhook delivery logs show repeated timeout errors
- Events are not redelivered after the endpoint recovers
- Error message:
Webhook delivery failed after 5 retries: timeout after 30s
Common Causes
- Webhook endpoint server is down or unreachable
- Endpoint response time exceeds the webhook delivery timeout
- Firewall or network policy blocking webhook POST requests
- Endpoint returning 5xx errors for every webhook delivery
- SSL/TLS certificate expired on the webhook endpoint
Step-by-Step Fix
- 1.Check the webhook endpoint health: Verify the endpoint is reachable.
- 2.```bash
- 3.curl -v -X POST https://webhook-endpoint.example.com/webhook \
- 4.-H "Content-Type: application/json" \
- 5.-d '{"test": true}'
- 6.# Check response time and status code
- 7.
` - 8.Manually replay failed webhook events: Redeliver lost events.
- 9.```bash
- 10.# Via API (example for Stripe-like webhook system)
- 11.curl -X POST https://api.example.com/webhooks/wh_abc123/redeliver \
- 12.-H "Authorization: Bearer $API_KEY"
- 13.# Or replay all failed events
- 14.curl -X POST https://api.example.com/webhooks/redeliver-all-failed \
- 15.-H "Authorization: Bearer $API_KEY"
- 16.
` - 17.Increase the webhook delivery timeout and retry count: Give the endpoint more time.
- 18.```yaml
- 19.webhook_config:
- 20.timeout: 60s # Increased from 30s
- 21.max_retries: 10 # Increased from 5
- 22.retry_backoff: exponential
- 23.initial_delay: 1s
- 24.max_delay: 300s
- 25.
` - 26.Fix the endpoint to respond promptly: Optimize the webhook handler.
- 27.```python
- 28.# Fast webhook handler - acknowledge immediately, process asynchronously
- 29.@app.post('/webhook')
- 30.async def handle_webhook(request: Request):
- 31.# Acknowledge immediately (within timeout)
- 32.event = await request.json()
- 33.# Queue for async processing
- 34.await task_queue.enqueue('process_webhook', event)
- 35.return JSONResponse(status_code=200, content={'received': True})
- 36.
` - 37.Verify webhook delivery succeeds: Test end-to-end.
- 38.```bash
- 39.# Trigger a test webhook event
- 40.curl -X POST https://api.example.com/webhooks/test \
- 41.-H "Authorization: Bearer $API_KEY"
- 42.# Check the webhook endpoint received it
- 43.# Check the webhook dashboard shows 'delivered' status
- 44.
`
Prevention
- Design webhook endpoints to acknowledge receipt within 5 seconds and process asynchronously
- Monitor webhook delivery success rates and alert on failure spikes
- Implement webhook endpoint health checks that run before delivery attempts
- Provide a webhook event log API that consumers can poll as a delivery fallback
- Configure webhook delivery with exponential backoff retry schedules
- Offer multiple webhook endpoint URLs for redundancy (primary and failover)