# Fix Envoy Rate Limit Configuration with envoyproxy/ratelimit
You deployed Envoy as an API gateway with rate limiting, but users can still make unlimited requests. Or worse, all requests return 429 Too Many Requests even though traffic is low.
$ curl -I https://api.example.com/v1/users
HTTP/1.1 429 Too Many Requests
x-envoy-ratelimited: true
retry-after: 60The envoyproxy/ratelimit service provides global rate limiting for Envoy, but misconfiguration is common.
Real Scenario: Rate Limits Not Applied
A SaaS company deployed Envoy with rate limiting to protect their API. They configured limits of 100 requests per minute per user, but monitoring showed some users making 1000+ requests per minute without being blocked.
The problem: The rate limit filter was in the Envoy configuration, but the domain didn't match between Envoy and the rate limit service.
Envoy config had:
domain: "production-api"Rate limit config had:
domain: api-production # Different name!Because the domains didn't match, Envoy's rate limit requests were ignored by the service.
Architecture Overview
┌─────────┐ ┌──────────────┐ ┌──────────────────┐ ┌───────┐
│ Client │────▶│ Envoy Proxy │────▶│ Rate Limit Svc │────▶│ Redis │
└─────────┘ └──────────────┘ └──────────────────┘ └───────┘
│
▼
┌────────────────┐
│ Backend Service│
└────────────────┘- 1.Client sends request to Envoy
- 2.Envoy calls rate limit service (gRPC on port 6070)
- 3.Rate limit service checks Redis for counters
- 4.Service returns OK or OVER_LIMIT
- 5.Envoy forwards request or returns 429
Quick Start with Docker
Test the setup locally before deploying to production:
```bash # 1. Start Redis docker run -d --name redis \ --network ratelimit-net \ redis:7-alpine
# 2. Create rate limit configuration mkdir -p /tmp/ratelimit/config cat > /tmp/ratelimit/config/ratelimit-config.yaml << 'EOF' domain: my-api descriptors: - key: user_id rate_limit: unit: minute requests_per_unit: 100 EOF
# 3. Start rate limit service docker run -d --name ratelimit \ --network ratelimit-net \ -p 8080:8080 \ -p 6070:6070 \ -e REDIS_SOCKET_TYPE=tcp \ -e REDIS_TCP_HOST=redis \ -e REDIS_TCP_PORT=6379 \ -e RUNTIME_ROOT=/data \ -e RUNTIME_SUBDIRECTORY=ratelimit \ -e RUNTIME_WATCH_ROOT=false \ -e USE_STATSD=false \ -v /tmp/ratelimit:/data \ envoyproxy/ratelimit:latest
# 4. Verify service is running curl http://localhost:8080/healthcheck # Should return: OK ```
Kubernetes Deployment
Step 1: Create Namespace and Redis
# redis.yaml
apiVersion: v1
kind: Namespace
metadata:
name: ratelimit
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: ratelimit
spec:
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
livenessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: ratelimit
spec:
selector:
app: redis
ports:
- port: 6379
targetPort: 6379Step 2: Deploy Rate Limit Service
```yaml # ratelimit.yaml apiVersion: v1 kind: ConfigMap metadata: name: ratelimit-config namespace: ratelimit data: ratelimit-config.yaml: | domain: production-api descriptors: # Per-user rate limit: 100 requests per minute - key: user_id rate_limit: unit: minute requests_per_unit: 100
# Per-IP rate limit: 20 requests per second - key: remote_address rate_limit: unit: second requests_per_unit: 20
# Nested: Per-user per-endpoint limits - key: user_id descriptors: - key: endpoint rate_limit: unit: minute requests_per_unit: 30 --- apiVersion: apps/v1 kind: Deployment metadata: name: ratelimit namespace: ratelimit spec: selector: matchLabels: app: ratelimit template: metadata: labels: app: ratelimit spec: containers: - name: ratelimit image: envoyproxy/ratelimit:latest ports: - containerPort: 8080 name: http - containerPort: 6070 name: grpc env: - name: REDIS_SOCKET_TYPE value: "tcp" - name: REDIS_TCP_HOST value: "redis" - name: REDIS_TCP_PORT value: "6379" - name: RUNTIME_ROOT value: "/data" - name: RUNTIME_SUBDIRECTORY value: "ratelimit" - name: RUNTIME_WATCH_ROOT value: "false" - name: RUNTIME_IGNOREDOTFILES value: "true" - name: LOG_LEVEL value: "info" - name: USE_STATSD value: "false" - name: GRPC_PORT value: "6070" - name: PORT value: "8080" resources: requests: memory: "256Mi" cpu: "200m" limits: memory: "512Mi" cpu: "500m" volumeMounts: - name: config mountPath: /data/ratelimit/config readOnly: true livenessProbe: httpGet: path: /healthcheck port: 8080 initialDelaySeconds: 10 periodSeconds: 5 readinessProbe: httpGet: path: /healthcheck port: 8080 initialDelaySeconds: 5 periodSeconds: 3 volumes: - name: config configMap: name: ratelimit-config --- apiVersion: v1 kind: Service metadata: name: ratelimit namespace: ratelimit spec: selector: app: ratelimit ports: - port: 8080 name: http targetPort: 8080 - port: 6070 name: grpc targetPort: 6070 ```
Step 3: Configure Envoy
```yaml # envoy.yaml static_resources: listeners: - name: listener_0 address: socket_address: address: 0.0.0.0 port_value: 10000 filter_chains: - filters: - name: envoy.filters.network.http_connection_manager typed_config: "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager stat_prefix: ingress_http route_config: name: local_route virtual_hosts: - name: local_service domains: ["*"] routes: - match: prefix: "/" route: cluster: backend_service http_filters: # Rate limit filter MUST come before router - name: envoy.filters.http.ratelimit typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit domain: "production-api" # MUST match config file failure_mode_deny: false # Don't block if rate limit service is down timeout: 0.5s rate_limited_as_resource_exhausted: true rate_limit_service: grpc_service: envoy_grpc: cluster_name: ratelimit_cluster transport_api_version: V3 - name: envoy.filters.http.router typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters: - name: backend_service connect_timeout: 5s type: STRICT_DNS lb_policy: ROUND_ROBIN load_assignment: cluster_name: backend_service endpoints: - lb_endpoints: - endpoint: address: socket_address: address: backend-service.default.svc.cluster.local port_value: 8080
- name: ratelimit_cluster
- connect_timeout: 0.5s
- type: STRICT_DNS
- lb_policy: ROUND_ROBIN
- http2_protocol_options: {} # Required for gRPC
- load_assignment:
- cluster_name: ratelimit_cluster
- endpoints:
- - lb_endpoints:
- - endpoint:
- address:
- socket_address:
- address: ratelimit.ratelimit.svc.cluster.local
- port_value: 6070
`
Common Configuration Errors
Error 1: Domain Mismatch
Symptom: Rate limits not applied, all requests pass through.
Cause: The domain in Envoy config doesn't match the rate limit config.
Debug:
```bash # Check Envoy config kubectl exec -n default deployment/envoy -- curl -s localhost:9901/config_dump | jq '.configs[0].dynamic_listeners[0].active_state.listener.filter_chains[0].filters[1].typed_config.domain'
# Check rate limit config kubectl get configmap ratelimit-config -n ratelimit -o yaml | grep domain ```
Fix: Ensure both match exactly:
```yaml # Envoy config domain: "production-api"
# Rate limit config domain: production-api # No quotes needed in YAML ```
Error 2: Missing HTTP/2 on gRPC Port
Symptom: Rate limit service unreachable, timeouts in Envoy logs.
Cause: gRPC requires HTTP/2, but the cluster doesn't have http2_protocol_options.
Envoy error log:
[warning][config] [source/extensions/filters/http/ratelimit/ratelimit.cc:79] rate limit service cluster 'ratelimit_cluster' is not configured for HTTP/2Fix:
- name: ratelimit_cluster
http2_protocol_options: {} # Add this lineError 3: All Requests Blocked
Symptom: Every request returns 429, even the first request.
Cause: Redis connection failure or misconfigured rate limit values.
Debug:
```bash # Check rate limit service logs kubectl logs -n ratelimit deployment/ratelimit --tail=100
# Look for Redis connection errors: # "redis: connection refused" or "dial tcp: connection refused"
# Test Redis connectivity from rate limit pod kubectl exec -n ratelimit deployment/ratelimit -- nc -zv redis 6379 # Should output: redis.ratelimit.svc.cluster.local (10.0.0.1) 6379 (?) open ```
Fix: Verify Redis is running and accessible:
kubectl get pods -n ratelimit -l app=redis
kubectl logs -n ratelimit deployment/redisError 4: Rate Limit Service Crashes
Symptom: Container exits immediately with error.
Cause: Missing required environment variables.
Debug:
kubectl logs -n ratelimit deployment/ratelimit
# Look for: "REDIS_SOCKET_TYPE must be set"Required environment variables:
env:
- name: REDIS_SOCKET_TYPE
value: "tcp"
- name: REDIS_TCP_HOST
value: "redis"
- name: REDIS_TCP_PORT
value: "6379"Error 5: Wrong Filter Order
Symptom: Rate limiting works but some responses are malformed.
Cause: Rate limit filter must come before the router filter.
Wrong:
http_filters:
- name: envoy.filters.http.router # Router first - WRONG
- name: envoy.filters.http.ratelimitCorrect:
http_filters:
- name: envoy.filters.http.ratelimit # Rate limit first
- name: envoy.filters.http.router # Router lastRate Limit Configuration Examples
Per-User Rate Limiting
domain: production-api
descriptors:
- key: user_id
rate_limit:
unit: minute
requests_per_unit: 100Envoy must send the user_id descriptor:
# In Envoy route configuration
rate_limits:
- actions:
- request_headers:
header_name: x-user-id
descriptor_key: user_idIP-Based Rate Limiting
domain: production-api
descriptors:
- key: remote_address
rate_limit:
unit: second
requests_per_unit: 20Envoy configuration:
rate_limits:
- actions:
- remote_address: {}Nested Rate Limits
domain: production-api
descriptors:
# First check: per-user limit
- key: user_id
rate_limit:
unit: minute
requests_per_unit: 100
# Second check: per-user per-endpoint limit
descriptors:
- key: endpoint
rate_limit:
unit: minute
requests_per_unit: 30This allows 100 requests/minute total per user, but only 30/minute for each endpoint.
Testing Rate Limits
```bash # Send 25 requests and count HTTP status codes for i in {1..25}; do curl -s -o /dev/null -w "%{http_code}\n" \ -H "x-user-id: user123" \ http://localhost:10000/api/test done | sort | uniq -c
# Expected output (with 20/sec limit): # 20 200 # 5 429 ```
Production Best Practices
1. Use failure_mode_deny Carefully
```yaml # Safe for non-critical APIs failure_mode_deny: false # Allow traffic if rate limit service fails
# Safe only if you prefer outages over overuse failure_mode_deny: true # Block all traffic if rate limit service fails ```
2. Set Appropriate Timeouts
timeout: 0.5s # Don't let rate limit checks slow down requests3. Monitor Rate Limit Metrics
```bash # Check Envoy stats curl http://localhost:9901/stats | grep ratelimit
# Key metrics: # cluster.ratelimit_cluster.upstream_rq_total # cluster.ratelimit_cluster.upstream_cx_connect_fail # http.ratelimit.over_limit # http.ratelimit.ok ```
4. Use Redis Sentinel for High Availability
env:
- name: REDIS_TYPE
value: "sentinel"
- name: REDIS_SENTINEL_MASTER_NAME
value: "mymaster"
- name: REDIS_SENTINEL_ADDRESSES
value: "sentinel1:26379,sentinel2:26379,sentinel3:26379"Troubleshooting Checklist
- 1.Verify rate limit service is healthy:
- 2.```bash
- 3.curl http://ratelimit.ratelimit.svc.cluster.local:8080/healthcheck
- 4.
` - 5.Check Redis connectivity:
- 6.```bash
- 7.kubectl exec -n ratelimit deployment/ratelimit -- redis-cli -h redis ping
- 8.
` - 9.Verify domain matches:
- 10.```bash
- 11.# Envoy domain
- 12.curl -s localhost:9901/config_dump | jq '.configs[0].dynamic_listeners[0].active_state.listener.filter_chains[0].filters[1].typed_config.domain'
- 13.# Rate limit config domain
- 14.kubectl get configmap ratelimit-config -n ratelimit -o yaml | grep domain
- 15.
` - 16.Check Envoy cluster is configured:
- 17.```bash
- 18.curl -s localhost:9901/clusters | grep ratelimit
- 19.
` - 20.Watch rate limit service logs:
- 21.```bash
- 22.kubectl logs -n ratelimit deployment/ratelimit -f
- 23.
`