What's Actually Happening
You enabled SnapStart for your Java Lambda function expecting near-instant cold starts, but functions still experience slow initialization, or SnapStart fails to create snapshots entirely. The latency improvements don't materialize as expected.
The Error You'll See
SnapStart not enabled:
```bash $ aws lambda get-function-configuration --function-name my-function
{ "FunctionName": "my-function", "Runtime": "java17", "SnapStart": { "ApplyOn": "None", # Should be "PublishedVersions" "OptimizationStatus": "Off" } } ```
Snapshot creation failed:
```bash $ aws lambda publish-version --function-name my-function
An error occurred (InvalidParameterValueException) when calling the PublishVersion operation: SnapStart initialization failed. Check your function's initialization code for runtime errors or timeouts exceeding 15 seconds. ```
Cold start still slow:
```bash # CloudWatch Logs show: INIT_START Runtime Version: java:17 RESTORE_START SnapshotId: my-snapshot-id RESTORE_DURATION: 500ms # Should be <100ms INVOKE_START
# Or no snapshot restore at all: INIT_START Runtime Version: java:17 INIT_DURATION: 8500ms # No snapshot used! ```
Why This Happens
- 1.Runtime not supported - SnapStart only works with Java 11/17 runtimes
- 2.Not publishing versions - SnapStart only applies to published versions
- 3.Init timeout - Initialization exceeds 15-second limit
- 4.Memory too low - Snapshot creation requires adequate memory
- 5.Network calls in init - Static initialization makes network calls
- 6.Random number issues - UniqueIdGenerator not properly reset after restore
- 7.Thread pools not recreated - Executors not reinitialized after restore
- 8.Native code used - JNI or native libraries incompatible with snapshot
Step 1: Verify SnapStart Requirements
```bash # Check function runtime (must be Java 11 or 17): aws lambda get-function-configuration --function-name my-function \ --query 'Runtime' --output text
# Expected: java11 or java17
# Check function memory (recommend >= 1024MB): aws lambda get-function-configuration --function-name my-function \ --query 'MemorySize' --output text
# Recommended: 1024-2048 MB for SnapStart
# Check function timeout: aws lambda get-function-configuration --function-name my-function \ --query 'Timeout' --output text
# Max init time: 15 seconds for SnapStart
# Check architecture (x86_64 only, not ARM): aws lambda get-function-configuration --function-name my-function \ --query 'Architectures' --output text
# Must be ["x86_64"]
# Check if using Provisioned Concurrency (incompatible): aws lambda get-provisioned-concurrency-config \ --function-name my-function 2>/dev/null
# Should return error or empty if not using PC
# Verify all requirements: aws lambda get-function-configuration --function-name my-function \ --query '{Runtime:Runtime, Memory:MemorySize, Timeout:Timeout, Arch:Architectures}' ```
Step 2: Enable SnapStart Correctly
```bash # Enable SnapStart on function: aws lambda update-function-configuration \ --function-name my-function \ --snap-start ApplyOn=PublishedVersions
# Verify SnapStart enabled: aws lambda get-function-configuration --function-name my-function \ --query 'SnapStart'
# Output: # { # "ApplyOn": "PublishedVersions", # "OptimizationStatus": "On" # }
# CRITICAL: Must publish a version for SnapStart to activate: aws lambda publish-version \ --function-name my-function \ --description "SnapStart enabled version"
# Note the version number: # { # "FunctionName": "my-function", # "Version": "1", # ... # }
# Verify snapshot created: aws lambda get-function-configuration \ --function-name my-function \ --qualifier 1 \ --query 'SnapStart'
# Should show optimization applied
# Invoke with version (not $LATEST): aws lambda invoke \ --function-name my-function:1 \ --payload '{"test": "data"}' \ response.json
# Check CloudWatch for snapshot restore: # Look for RESTORE_START, RESTORE_DURATION logs ```
Step 3: Fix Initialization Code for SnapStart
```java // BEFORE: Problematic initialization public class MyHandler implements RequestHandler<Map<String, Object>, String> {
// Problem 1: Network call in static init private static final String CONFIG = fetchConfigFromS3();
// Problem 2: Thread pool in static field private static final ExecutorService executor = Executors.newFixedThreadPool(10);
// Problem 3: Random at static init private static final SecureRandom random = new SecureRandom();
// Problem 4: File descriptor private static final FileInputStream file;
static { try { file = new FileInputStream("/tmp/data.txt"); } catch (IOException e) { throw new RuntimeException(e); } }
@Override public String handleRequest(Map<String, Object> input, Context ctx) { // Handler code return "OK"; }
private static String fetchConfigFromS3() { // This runs at init time - BAD for SnapStart // Network calls should be in handler } }
// AFTER: SnapStart-compatible initialization public class MyHandler implements RequestHandler<Map<String, Object>, String> {
// Lazy initialization - don't fetch until handler runs private String config;
// Thread pool - use Lambda's managed executor // Don't create your own thread pools
// Use SnapStart-aware unique IDs private final UniqueIdGenerator idGenerator = UniqueIdGenerator.builder().build();
@Override public String handleRequest(Map<String, Object> input, Context ctx) { // Lazy load configuration if (config == null) { config = fetchConfigFromS3(); }
// Generate unique ID correctly after restore String id = idGenerator.generate();
return processRequest(input, id); }
private String fetchConfigFromS3() { // Fetch in handler, not static init S3Client s3 = S3Client.create(); return s3.getObjectAsString(Request.builder() .bucket("my-bucket") .key("config.json") .build()); } }
// Use CRaC (Coordinated Restore at Checkpoint) for custom logic import software.amazon.awssdk.enhanced.snapshots.CRacHandler;
public class SnapStartAwareHandler implements RequestHandler<Map<String, Object>, String> {
private volatile boolean initialized = false;
@Override public String handleRequest(Map<String, Object> input, Context ctx) { if (!initialized) { // Initialize only once after restore initialize(); initialized = true; }
return handle(input); }
private void initialize() { // This runs once per restored snapshot // Called when cold start happens }
@RuntimeHook( beforeCheckpoint = true, afterRestore = true ) public void onSnapStartEvent(SnapStartEvent event) { if (event.isBeforeCheckpoint()) { // Cleanup before snapshot closeConnections(); } else { // Reinitialize after restore reinitializeConnections(); } } } ```
Step 4: Handle Unique IDs and Random Numbers
```java // PROBLEM: Random numbers and IDs not unique after restore
import com.amazonaws.services.lambda.runtime.snapstart.SnapStartPreHandler;
public class MyHandler implements RequestHandler<Map<String, Object>, String> {
// BAD: Random restored from snapshot private final Random random = new Random();
// GOOD: Use AWS SDK UniqueIdGenerator private final UniqueIdGenerator idGen = UniqueIdGenerator.builder() .enablePrefetching(true) .build();
// GOOD: UUID is fine (generates new on each call)
@Override public String handleRequest(Map<String, Object> input, Context ctx) { // BAD: May return same "random" value after restore int rand = random.nextInt();
// GOOD: Unique ID after restore String uniqueId = idGen.generate();
// GOOD: UUID always unique String uuid = UUID.randomUUID().toString();
return "ID: " + uniqueId; } }
// Use SnapStartPreHandler for custom reset logic import com.amazonaws.services.lambda.runtime.snapstart.SnapStartPreHandler;
public class MyHandler implements RequestHandler<Map<String, Object>, String> implements SnapStartPreHandler {
private SecureRandom secureRandom;
public MyHandler() { secureRandom = new SecureRandom(); }
@Override public void beforeCheckpoint() { // Called before snapshot is taken // Reset random number generator secureRandom = null; }
@Override public void afterRestore() { // Called after snapshot is restored // Reinitialize random number generator if (secureRandom == null) { secureRandom = new SecureRandom(); } }
@Override public String handleRequest(Map<String, Object> input, Context ctx) { if (secureRandom == null) { secureRandom = new SecureRandom(); }
byte[] randomBytes = new byte[16]; secureRandom.nextBytes(randomBytes);
return Base64.getEncoder().encodeToString(randomBytes); } } ```
Step 5: Fix Network and Resource Connections
```java // PROBLEM: Connections restored from snapshot are stale
public class MyHandler implements RequestHandler<Map<String, Object>, String> {
// BAD: Connection in static field private static Connection dbConnection;
static { try { dbConnection = DriverManager.getConnection(url); } catch (SQLException e) { throw new RuntimeException(e); } }
// GOOD: Lazy connection with refresh private Connection connection; private final Object connectionLock = new Object();
private Connection getConnection() throws SQLException { synchronized (connectionLock) { if (connection == null || !connection.isValid(1)) { connection = DriverManager.getConnection(dbUrl, props); } return connection; } }
// GOOD: Use connection pool with validation private static HikariDataSource dataSource;
static { HikariConfig config = new HikariConfig(); config.setJdbcUrl(dbUrl); config.setMaximumPoolSize(5); config.setConnectionTimeout(10000); config.setValidationTimeout(1000); config.setTestOnBorrow(true); config.setConnectionTestQuery("SELECT 1");
dataSource = new HikariDataSource(config); }
@Override public String handleRequest(Map<String, Object> input, Context ctx) { try (Connection conn = dataSource.getConnection()) { // Connection is validated and fresh return executeQuery(conn, input); } } }
// AWS SDK clients automatically handle SnapStart // They check validity and reconnect if needed
public class S3Handler implements RequestHandler<Map<String, Object>, String> {
// GOOD: AWS SDK clients are SnapStart-aware private final S3Client s3 = S3Client.builder() .region(Region.US_EAST_1) .build();
// AWS SDK automatically handles: // - Connection refresh after restore // - Credential refresh // - Retry logic
@Override public String handleRequest(Map<String, Object> input, Context ctx) { // S3 client works correctly after restore return s3.listBuckets().buckets().stream() .map(Bucket::name) .collect(Collectors.joining(",")); } } ```
Step 6: Debug SnapStart Initialization Issues
```bash # Enable verbose logging in Lambda: # Set environment variable: JAVA_TOOL_OPTIONS: "-Dlogging.level.com.amazonaws.services.lambda=DEBUG"
# Or in code: System.setProperty("logging.level.com.amazonaws.services.lambda", "DEBUG");
# Check CloudWatch Logs for initialization: # Look for: INIT_START SNAPSHOT_START SNAPSHOT_END RESTORE_START RESTORE_END
# Test init time: # Deploy and invoke with published version aws lambda invoke --function-name my-function:1 out.json
# Check init duration in logs: # INIT_START # INIT_REPORT Init Duration: 2500 ms # RESTORE_START # RESTORE_REPORT Restore Duration: 50 ms # Should be fast
# Test without SnapStart (use $LATEST): aws lambda invoke --function-name my-function out.json
# Compare init times: # With SnapStart: RESTORE_DURATION < 100ms # Without SnapStart: INIT_DURATION > 1000ms
# Check snapshot creation status: aws lambda get-function-configuration \ --function-name my-function \ --qualifier 1
# Look for SnapStart optimization status: # "SnapStart": { # "ApplyOn": "PublishedVersions", # "OptimizationStatus": "On" # }
# If snapshot failed: # Check for InitDuration > 15 seconds in logs # Or look for exception in CloudWatch ```
Step 7: Handle Thread Pools and Executors
```java // PROBLEM: Thread pools restored from snapshot
public class MyHandler implements RequestHandler<Map<String, Object>, String> implements SnapStartPreHandler {
// BAD: Static thread pool private static final ExecutorService executor = Executors.newFixedThreadPool(10);
// GOOD: Recreate thread pool after restore private ExecutorService executor;
@Override public void beforeCheckpoint() { // Shutdown threads before snapshot if (executor != null) { executor.shutdownNow(); try { executor.awaitTermination(1, TimeUnit.SECONDS); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } executor = null; } }
@Override public void afterRestore() { // Recreate thread pool after restore if (executor == null || executor.isShutdown()) { executor = Executors.newFixedThreadPool(10); } }
@Override public String handleRequest(Map<String, Object> input, Context ctx) { // Ensure executor is ready if (executor == null || executor.isShutdown()) { executor = Executors.newFixedThreadPool(10); }
Future<String> future = executor.submit(() -> processAsync(input));
try { return future.get(5, TimeUnit.SECONDS); } catch (Exception e) { future.cancel(true); throw new RuntimeException(e); } } }
// Even better: Use Lambda's managed threading public class SimpleHandler implements RequestHandler<Map<String, Object>, String> {
@Override public String handleRequest(Map<String, Object> input, Context ctx) { // Lambda handles threading // Don't create custom thread pools
// Use CompletableFuture for async work: CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> { return processAsync(input); });
return future.join(); } } ```
Step 8: Test SnapStart Performance
```bash # Create test script: cat << 'EOF' > test-snapstart.sh #!/bin/bash
FUNCTION="my-function" VERSION="1" ITERATIONS=10
echo "=== Testing SnapStart Performance ==="
echo -e "\nCold Start Tests (with SnapStart):" for i in $(seq 1 $ITERATIONS); do # Force cold start by updating env var slightly aws lambda update-function-configuration \ --function-name $FUNCTION \ --environment Variables={TEST_VAR=$i} > /dev/null
sleep 2
# Wait for SnapStart to create new snapshot aws lambda wait function-updated --function-name $FUNCTION
# Publish new version aws lambda publish-version \ --function-name $FUNCTION \ --description "Test version $i" > /dev/null
VERSION=$((i + 1))
# Invoke with cold start START=$(date +%s%N) aws lambda invoke --function-name $FUNCTION:$VERSION out.json > /dev/null END=$(date +%s%N)
DURATION=$(( ($END - $START) / 1000000 )) echo "Cold start $i: ${DURATION}ms"
sleep 5 done
echo -e "\nWarm Start Tests:" for i in $(seq 1 $ITERATIONS); do START=$(date +%s%N) aws lambda invoke --function-name $FUNCTION:$VERSION out.json > /dev/null END=$(date +%s%N)
DURATION=$(( ($END - $START) / 1000000 )) echo "Warm start $i: ${DURATION}ms" done
echo -e "\nCheck CloudWatch for detailed metrics" EOF
chmod +x test-snapstart.sh
# Run tests: ./test-snapstart.sh
| filter @message like /INIT_START |
|---|
Step 9: Monitor SnapStart Metrics
```bash # CloudWatch metrics for SnapStart:
# 1. Check SnapStart restores: aws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name SnapStartRestoreDuration \ --dimensions Name=FunctionName,Value=my-function \ --start-time $(date -u -d '-1 hour' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Average
# 2. Monitor init duration: aws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name InitDuration \ --dimensions Name=FunctionName,Value=my-function \ --start-time $(date -u -d '-1 hour' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Average
# 3. Create CloudWatch alarm for slow restores: aws cloudwatch put-metric-alarm \ --alarm-name "Lambda-SnapStart-SlowRestore" \ --alarm-description "Alert when SnapStart restore exceeds 500ms" \ --metric-name SnapStartRestoreDuration \ --namespace AWS/Lambda \ --dimensions Name=FunctionName,Value=my-function \ --statistic Average \ --period 60 \ --threshold 500 \ --comparison-operator GreaterThanThreshold \ --evaluation-periods 3 \ --treat-missing-data notBreaching
# 4. Check for snapshot failures: # Look for Lambda.SnapStart.SnapshotTimeout errors in logs aws logs filter-log-events \ --log-group-name /aws/lambda/my-function \ --filter-pattern "SnapStart" \ --start-time $(date -u -d '-1 hour' +%s)000
# 5. Use Lambda Insights for detailed performance: # Enable in function configuration: aws lambda update-function-configuration \ --function-name my-function \ --layers arn:aws:lambda:us-east-1:580247275435:layer:LambdaInsightsExtension:21 ```
Step 10: Implement Best Practices for SnapStart
```java // Complete SnapStart-compatible Lambda function:
import com.amazonaws.services.lambda.runtime.Context; import com.amazonaws.services.lambda.runtime.RequestHandler; import com.amazonaws.services.lambda.runtime.snapstart.SnapStartPreHandler; import software.amazon.awssdk.services.s3.S3Client; import software.amazon.awssdk.enhanced.snapshots.UniqueIdGenerator;
public class ProductionHandler implements RequestHandler<Map<String, Object>, String>, SnapStartPreHandler {
// SnapStart-aware ID generator private final UniqueIdGenerator idGenerator = UniqueIdGenerator.builder().enablePrefetching(true).build();
// Lazy-loaded configuration private volatile String config; private final Object configLock = new Object();
// AWS SDK client (SnapStart-aware) private final S3Client s3 = S3Client.create();
// Connection pool with validation private volatile HikariDataSource dataSource;
@Override public void beforeCheckpoint() { // Cleanup before snapshot closeDatabaseConnections(); }
@Override public void afterRestore() { // Reinitialize after restore initializeConnections(); }
@Override public String handleRequest(Map<String, Object> input, Context ctx) { // Ensure initialization ensureInitialized();
// Generate unique ID String requestId = idGenerator.generate();
// Process request return processRequest(input, requestId, ctx); }
private void ensureInitialized() { if (config == null) { synchronized (configLock) { if (config == null) { config = loadConfig(); } } } }
private void closeDatabaseConnections() { HikariDataSource ds = dataSource; if (ds != null && !ds.isClosed()) { ds.close(); dataSource = null; } }
private void initializeConnections() { if (dataSource == null || dataSource.isClosed()) { HikariConfig config = new HikariConfig(); config.setJdbcUrl(System.getenv("DB_URL")); config.setMaximumPoolSize(5); config.setConnectionTimeout(5000); config.setValidationTimeout(1000); config.setTestOnBorrow(true); dataSource = new HikariDataSource(config); } } }
// deployment.yaml for CloudFormation: AWSTemplateFormatVersion: '2010-09-09' Description: Lambda function with SnapStart
Resources: MyFunction: Type: AWS::Lambda::Function Properties: FunctionName: my-function Runtime: java17 MemorySize: 1024 Timeout: 30 Handler: com.example.ProductionHandler Code: S3Bucket: my-bucket S3Key: function.jar SnapStart: ApplyOn: PublishedVersions Environment: Variables: DB_URL: !Ref DatabaseUrl
MyFunctionVersion: Type: AWS::Lambda::Version Properties: FunctionName: !Ref MyFunction Description: SnapStart enabled version
Outputs: FunctionArn: Value: !GetAtt MyFunction.Arn VersionArn: Value: !Ref MyFunctionVersion ```
AWS Lambda SnapStart Checklist
| Check | Command | Expected |
|---|---|---|
| Runtime | get-function-config | java11 or java17 |
| Architecture | get-function-config | x86_64 |
| SnapStart enabled | get-function-config | ApplyOn: PublishedVersions |
| Version published | list-versions | Version exists |
| Snapshot created | CloudWatch logs | RESTORE_START logged |
| Restore time | CloudWatch logs | < 100ms |
| Unique IDs | test | Different after restore |
Verify the Fix
```bash # After implementing SnapStart correctly:
# 1. Publish new version aws lambda publish-version --function-name my-function # Output: Version "2"
# 2. Invoke cold start aws lambda invoke --function-name my-function:2 response.json
# 3. Check CloudWatch logs for: # - INIT_START # - SNAPSHOT_START (if creating new) # - RESTORE_START (if using snapshot) # - RESTORE_DURATION: < 100ms # - INVOKE_START
# 4. Compare performance: # Before SnapStart: INIT_DURATION: 5000-10000ms # After SnapStart: RESTORE_DURATION: 20-80ms
# 5. Test unique IDs for i in {1..10}; do aws lambda invoke --function-name my-function:2 out.json --payload '{"test":true}' cat out.json | jq '.requestId' done # Each should be unique
# 6. Monitor in production: aws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name SnapStartRestoreDuration \ --dimensions Name=FunctionName,Value=my-function \ --start-time $(date -u -d '-1 day' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 300 \ --statistics Average
# Verify latency improvement: # Before: 5000-10000ms cold start # After: 20-100ms restore time ```
Related Issues
- [Fix AWS Lambda Cold Start](/articles/fix-aws-lambda-cold-start)
- [Fix AWS Lambda Timeout](/articles/fix-aws-lambda-timeout)
- [Fix AWS Lambda Memory Issues](/articles/fix-aws-lambda-memory-issues)