Introduction Cosmos DB 429 (Too Many Requests) errors occur when your application exceeds the provisioned Request Units per second. Cosmos DB returns 429 with a retry-after header, but without proper retry logic, these become application errors.
Symptoms - HTTP 429 response: "Request rate is large. RetryAfterMs: 1234" - Azure portal shows Normalized RU Consumption at or near 100% - Increased latency as client retries back off - Application timeouts due to exhausted retry budget
Common Causes - Provisioned RU/s too low for workload - Hot partition: single logical partition receiving too many requests - Cross-partition queries consuming excessive RUs - Missing or incorrect retry policy in SDK - Autoscale ceiling too low for peak traffic
Step-by-Step Fix 1. **Check current RU consumption**: ```bash az monitor metrics list --resource <cosmos-resource-id> \ --metric "NormalizedRUConsumption" "TotalRequestUnits" "ThrottledRequests" --interval PT1M ```
- 1.Increase provisioned RU/s or switch to autoscale:
- 2.```bash
- 3.az cosmosdb sql container throughput migrate \
- 4.--account-name my-cosmos --database-name mydb --name mycontainer \
- 5.--resource-group my-rg --throughput-type autoscale --max-throughput 10000
- 6.
` - 7.Implement SDK retry with exponential backoff (C#):
- 8.```csharp
- 9.var options = new CosmosClientOptions {
- 10.ConnectionMode = ConnectionMode.Gateway,
- 11.MaxRetryAttemptsOnRateLimitedRequests = 10,
- 12.MaxRetryWaitTimeOnRateLimitedRequests = TimeSpan.FromSeconds(30)
- 13.};
- 14.var client = new CosmosClient(connectionString, options);
- 15.
` - 16.Use bulk execution for large operations:
- 17.```csharp
- 18.var options = new CosmosClientOptions { AllowBulkExecution = true };
- 19.var client = new CosmosClient(connectionString, options);
- 20.var container = client.GetContainer("mydb", "mycontainer");
- 21.var tasks = items.Select(item => container.CreateItemAsync(item));
- 22.await Task.WhenAll(tasks);
- 23.
`