Introduction

AWS Lambda functions have a configurable timeout (up to 15 minutes), but functions often hit this limit due to inefficient code, cold starts in VPCs, or waiting for external resources. Understanding the root cause is essential for proper resolution.

Symptoms

  • CloudWatch Logs show Task timed out after X.00 seconds
  • Function execution ends without completion
  • API Gateway returns 504 Gateway Timeout
  • Step Functions show failed Lambda invocations
  • Function works locally but times out in AWS
  • Intermittent timeouts (suggests cold start issues)

Common Causes

  • Timeout setting lower than function execution time
  • VPC cold start adding 5-15 seconds to initialization
  • Infinite loops or blocking operations in code
  • Database connections not using connection pooling
  • Waiting for external API responses without timeout
  • Insufficient memory causing slow execution
  • Recursive invocations without termination condition

Step-by-Step Fix

aws lambda put-provisioned-concurrency-config \ --function-name my-function \ --qualifier PROD \ --provisioned-concurrent-executions 10 ```

  1. 1.Optimize function code with async patterns:
  2. 2.```python
  3. 3.import asyncio
  4. 4.import aiohttp

async def fetch_data(url): async with aiohttp.ClientSession() as session: async with session.get(url, timeout=5) as response: return await response.json()

def lambda_handler(event, context): loop = asyncio.get_event_loop() results = loop.run_until_complete( asyncio.gather(*[fetch_data(url) for url in urls]) ) return results ```

  1. 1.Use connection pooling for database connections:
  2. 2.```python
  3. 3.import boto3
  4. 4.from pymongo import MongoClient

# Initialize outside handler for connection reuse client = None

def get_client(): global client if client is None: client = MongoClient(os.environ['MONGODB_URI']) return client

def lambda_handler(event, context): db = get_client().get_database() # Use db connection ```

  1. 1.Set appropriate timeouts for external calls:
  2. 2.```python
  3. 3.import requests

def lambda_handler(event, context): # Always set timeout shorter than Lambda timeout response = requests.get( 'https://api.example.com/data', timeout=10 # Lambda timeout should be > 10 + buffer ) ```

  1. 1.Use Lambda Power Tuning to optimize memory:
  2. 2.```bash
  3. 3.# Deploy Lambda Power Tuning
  4. 4.aws lambda invoke \
  5. 5.--function-name power-tuner \
  6. 6.--payload '{
  7. 7."lambdaARN": "arn:aws:lambda:region:account:function:my-function",
  8. 8."powerValues": [128, 256, 512, 1024, 2048],
  9. 9."num": 50
  10. 10.}' \
  11. 11.output.json
  12. 12.`
  13. 13.Implement graceful timeout handling:
  14. 14.```python
  15. 15.import signal
  16. 16.import time

class TimeoutException(Exception): pass

def timeout_handler(signum, frame): raise TimeoutException("Function timing out")

def lambda_handler(event, context): # Set signal handler signal.signal(signal.SIGALRM, timeout_handler) # Set alarm 2 seconds before Lambda timeout signal.alarm(context.get_remaining_time_in_millis() // 1000 - 2)

try: # Your code here pass except TimeoutException: # Cleanup and save state return {"status": "timeout", "partial": True} ```

  1. 1.Monitor and alert on timeout patterns:
  2. 2.```python
  3. 3.# CloudWatch Alarm
  4. 4.aws cloudwatch put-metric-alarm \
  5. 5.--alarm-name lambda-timeout-alarm \
  6. 6.--metric-name Errors \
  7. 7.--namespace AWS/Lambda \
  8. 8.--statistic Sum \
  9. 9.--period 300 \
  10. 10.--threshold 1 \
  11. 11.--comparison-operator GreaterThanOrEqualToThreshold \
  12. 12.--dimensions Name=FunctionName,Value=my-function
  13. 13.`

Prevention

  • Always set function timeout based on p99 execution time + 20%
  • Use async patterns for I/O operations
  • Implement connection pooling for databases
  • Monitor cold start duration with X-Ray
  • Set provisioned concurrency for latency-sensitive functions
  • Use Step Functions for long-running workflows (>15 min)
  • Regular performance testing with realistic load
  • Set up CloudWatch alarms for error rates

VPC-Specific Optimizations

  • Use VPC endpoints for AWS services to avoid NAT Gateway
  • Consider using Lambda outside VPC with IAM authentication
  • Use RDS Proxy for database connections
  • Implement keep-alive for established connections
  • Use Elasticache for VPC-local caching