What's Actually Happening

You enabled SnapStart for your Java Lambda function expecting near-instant cold starts, but functions still experience slow initialization, or SnapStart fails to create snapshots entirely. The latency improvements don't materialize as expected.

The Error You'll See

SnapStart not enabled:

```bash $ aws lambda get-function-configuration --function-name my-function

{ "FunctionName": "my-function", "Runtime": "java17", "SnapStart": { "ApplyOn": "None", # Should be "PublishedVersions" "OptimizationStatus": "Off" } } ```

Snapshot creation failed:

```bash $ aws lambda publish-version --function-name my-function

An error occurred (InvalidParameterValueException) when calling the PublishVersion operation: SnapStart initialization failed. Check your function's initialization code for runtime errors or timeouts exceeding 15 seconds. ```

Cold start still slow:

```bash # CloudWatch Logs show: INIT_START Runtime Version: java:17 RESTORE_START SnapshotId: my-snapshot-id RESTORE_DURATION: 500ms # Should be <100ms INVOKE_START

# Or no snapshot restore at all: INIT_START Runtime Version: java:17 INIT_DURATION: 8500ms # No snapshot used! ```

Why This Happens

  1. 1.Runtime not supported - SnapStart only works with Java 11/17 runtimes
  2. 2.Not publishing versions - SnapStart only applies to published versions
  3. 3.Init timeout - Initialization exceeds 15-second limit
  4. 4.Memory too low - Snapshot creation requires adequate memory
  5. 5.Network calls in init - Static initialization makes network calls
  6. 6.Random number issues - UniqueIdGenerator not properly reset after restore
  7. 7.Thread pools not recreated - Executors not reinitialized after restore
  8. 8.Native code used - JNI or native libraries incompatible with snapshot

Step 1: Verify SnapStart Requirements

```bash # Check function runtime (must be Java 11 or 17): aws lambda get-function-configuration --function-name my-function \ --query 'Runtime' --output text

# Expected: java11 or java17

# Check function memory (recommend >= 1024MB): aws lambda get-function-configuration --function-name my-function \ --query 'MemorySize' --output text

# Recommended: 1024-2048 MB for SnapStart

# Check function timeout: aws lambda get-function-configuration --function-name my-function \ --query 'Timeout' --output text

# Max init time: 15 seconds for SnapStart

# Check architecture (x86_64 only, not ARM): aws lambda get-function-configuration --function-name my-function \ --query 'Architectures' --output text

# Must be ["x86_64"]

# Check if using Provisioned Concurrency (incompatible): aws lambda get-provisioned-concurrency-config \ --function-name my-function 2>/dev/null

# Should return error or empty if not using PC

# Verify all requirements: aws lambda get-function-configuration --function-name my-function \ --query '{Runtime:Runtime, Memory:MemorySize, Timeout:Timeout, Arch:Architectures}' ```

Step 2: Enable SnapStart Correctly

```bash # Enable SnapStart on function: aws lambda update-function-configuration \ --function-name my-function \ --snap-start ApplyOn=PublishedVersions

# Verify SnapStart enabled: aws lambda get-function-configuration --function-name my-function \ --query 'SnapStart'

# Output: # { # "ApplyOn": "PublishedVersions", # "OptimizationStatus": "On" # }

# CRITICAL: Must publish a version for SnapStart to activate: aws lambda publish-version \ --function-name my-function \ --description "SnapStart enabled version"

# Note the version number: # { # "FunctionName": "my-function", # "Version": "1", # ... # }

# Verify snapshot created: aws lambda get-function-configuration \ --function-name my-function \ --qualifier 1 \ --query 'SnapStart'

# Should show optimization applied

# Invoke with version (not $LATEST): aws lambda invoke \ --function-name my-function:1 \ --payload '{"test": "data"}' \ response.json

# Check CloudWatch for snapshot restore: # Look for RESTORE_START, RESTORE_DURATION logs ```

Step 3: Fix Initialization Code for SnapStart

```java // BEFORE: Problematic initialization public class MyHandler implements RequestHandler<Map<String, Object>, String> {

// Problem 1: Network call in static init private static final String CONFIG = fetchConfigFromS3();

// Problem 2: Thread pool in static field private static final ExecutorService executor = Executors.newFixedThreadPool(10);

// Problem 3: Random at static init private static final SecureRandom random = new SecureRandom();

// Problem 4: File descriptor private static final FileInputStream file;

static { try { file = new FileInputStream("/tmp/data.txt"); } catch (IOException e) { throw new RuntimeException(e); } }

@Override public String handleRequest(Map<String, Object> input, Context ctx) { // Handler code return "OK"; }

private static String fetchConfigFromS3() { // This runs at init time - BAD for SnapStart // Network calls should be in handler } }

// AFTER: SnapStart-compatible initialization public class MyHandler implements RequestHandler<Map<String, Object>, String> {

// Lazy initialization - don't fetch until handler runs private String config;

// Thread pool - use Lambda's managed executor // Don't create your own thread pools

// Use SnapStart-aware unique IDs private final UniqueIdGenerator idGenerator = UniqueIdGenerator.builder().build();

@Override public String handleRequest(Map<String, Object> input, Context ctx) { // Lazy load configuration if (config == null) { config = fetchConfigFromS3(); }

// Generate unique ID correctly after restore String id = idGenerator.generate();

return processRequest(input, id); }

private String fetchConfigFromS3() { // Fetch in handler, not static init S3Client s3 = S3Client.create(); return s3.getObjectAsString(Request.builder() .bucket("my-bucket") .key("config.json") .build()); } }

// Use CRaC (Coordinated Restore at Checkpoint) for custom logic import software.amazon.awssdk.enhanced.snapshots.CRacHandler;

public class SnapStartAwareHandler implements RequestHandler<Map<String, Object>, String> {

private volatile boolean initialized = false;

@Override public String handleRequest(Map<String, Object> input, Context ctx) { if (!initialized) { // Initialize only once after restore initialize(); initialized = true; }

return handle(input); }

private void initialize() { // This runs once per restored snapshot // Called when cold start happens }

@RuntimeHook( beforeCheckpoint = true, afterRestore = true ) public void onSnapStartEvent(SnapStartEvent event) { if (event.isBeforeCheckpoint()) { // Cleanup before snapshot closeConnections(); } else { // Reinitialize after restore reinitializeConnections(); } } } ```

Step 4: Handle Unique IDs and Random Numbers

```java // PROBLEM: Random numbers and IDs not unique after restore

import com.amazonaws.services.lambda.runtime.snapstart.SnapStartPreHandler;

public class MyHandler implements RequestHandler<Map<String, Object>, String> {

// BAD: Random restored from snapshot private final Random random = new Random();

// GOOD: Use AWS SDK UniqueIdGenerator private final UniqueIdGenerator idGen = UniqueIdGenerator.builder() .enablePrefetching(true) .build();

// GOOD: UUID is fine (generates new on each call)

@Override public String handleRequest(Map<String, Object> input, Context ctx) { // BAD: May return same "random" value after restore int rand = random.nextInt();

// GOOD: Unique ID after restore String uniqueId = idGen.generate();

// GOOD: UUID always unique String uuid = UUID.randomUUID().toString();

return "ID: " + uniqueId; } }

// Use SnapStartPreHandler for custom reset logic import com.amazonaws.services.lambda.runtime.snapstart.SnapStartPreHandler;

public class MyHandler implements RequestHandler<Map<String, Object>, String> implements SnapStartPreHandler {

private SecureRandom secureRandom;

public MyHandler() { secureRandom = new SecureRandom(); }

@Override public void beforeCheckpoint() { // Called before snapshot is taken // Reset random number generator secureRandom = null; }

@Override public void afterRestore() { // Called after snapshot is restored // Reinitialize random number generator if (secureRandom == null) { secureRandom = new SecureRandom(); } }

@Override public String handleRequest(Map<String, Object> input, Context ctx) { if (secureRandom == null) { secureRandom = new SecureRandom(); }

byte[] randomBytes = new byte[16]; secureRandom.nextBytes(randomBytes);

return Base64.getEncoder().encodeToString(randomBytes); } } ```

Step 5: Fix Network and Resource Connections

```java // PROBLEM: Connections restored from snapshot are stale

public class MyHandler implements RequestHandler<Map<String, Object>, String> {

// BAD: Connection in static field private static Connection dbConnection;

static { try { dbConnection = DriverManager.getConnection(url); } catch (SQLException e) { throw new RuntimeException(e); } }

// GOOD: Lazy connection with refresh private Connection connection; private final Object connectionLock = new Object();

private Connection getConnection() throws SQLException { synchronized (connectionLock) { if (connection == null || !connection.isValid(1)) { connection = DriverManager.getConnection(dbUrl, props); } return connection; } }

// GOOD: Use connection pool with validation private static HikariDataSource dataSource;

static { HikariConfig config = new HikariConfig(); config.setJdbcUrl(dbUrl); config.setMaximumPoolSize(5); config.setConnectionTimeout(10000); config.setValidationTimeout(1000); config.setTestOnBorrow(true); config.setConnectionTestQuery("SELECT 1");

dataSource = new HikariDataSource(config); }

@Override public String handleRequest(Map<String, Object> input, Context ctx) { try (Connection conn = dataSource.getConnection()) { // Connection is validated and fresh return executeQuery(conn, input); } } }

// AWS SDK clients automatically handle SnapStart // They check validity and reconnect if needed

public class S3Handler implements RequestHandler<Map<String, Object>, String> {

// GOOD: AWS SDK clients are SnapStart-aware private final S3Client s3 = S3Client.builder() .region(Region.US_EAST_1) .build();

// AWS SDK automatically handles: // - Connection refresh after restore // - Credential refresh // - Retry logic

@Override public String handleRequest(Map<String, Object> input, Context ctx) { // S3 client works correctly after restore return s3.listBuckets().buckets().stream() .map(Bucket::name) .collect(Collectors.joining(",")); } } ```

Step 6: Debug SnapStart Initialization Issues

```bash # Enable verbose logging in Lambda: # Set environment variable: JAVA_TOOL_OPTIONS: "-Dlogging.level.com.amazonaws.services.lambda=DEBUG"

# Or in code: System.setProperty("logging.level.com.amazonaws.services.lambda", "DEBUG");

# Check CloudWatch Logs for initialization: # Look for: INIT_START SNAPSHOT_START SNAPSHOT_END RESTORE_START RESTORE_END

# Test init time: # Deploy and invoke with published version aws lambda invoke --function-name my-function:1 out.json

# Check init duration in logs: # INIT_START # INIT_REPORT Init Duration: 2500 ms # RESTORE_START # RESTORE_REPORT Restore Duration: 50 ms # Should be fast

# Test without SnapStart (use $LATEST): aws lambda invoke --function-name my-function out.json

# Compare init times: # With SnapStart: RESTORE_DURATION < 100ms # Without SnapStart: INIT_DURATION > 1000ms

# Check snapshot creation status: aws lambda get-function-configuration \ --function-name my-function \ --qualifier 1

# Look for SnapStart optimization status: # "SnapStart": { # "ApplyOn": "PublishedVersions", # "OptimizationStatus": "On" # }

# If snapshot failed: # Check for InitDuration > 15 seconds in logs # Or look for exception in CloudWatch ```

Step 7: Handle Thread Pools and Executors

```java // PROBLEM: Thread pools restored from snapshot

public class MyHandler implements RequestHandler<Map<String, Object>, String> implements SnapStartPreHandler {

// BAD: Static thread pool private static final ExecutorService executor = Executors.newFixedThreadPool(10);

// GOOD: Recreate thread pool after restore private ExecutorService executor;

@Override public void beforeCheckpoint() { // Shutdown threads before snapshot if (executor != null) { executor.shutdownNow(); try { executor.awaitTermination(1, TimeUnit.SECONDS); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } executor = null; } }

@Override public void afterRestore() { // Recreate thread pool after restore if (executor == null || executor.isShutdown()) { executor = Executors.newFixedThreadPool(10); } }

@Override public String handleRequest(Map<String, Object> input, Context ctx) { // Ensure executor is ready if (executor == null || executor.isShutdown()) { executor = Executors.newFixedThreadPool(10); }

Future<String> future = executor.submit(() -> processAsync(input));

try { return future.get(5, TimeUnit.SECONDS); } catch (Exception e) { future.cancel(true); throw new RuntimeException(e); } } }

// Even better: Use Lambda's managed threading public class SimpleHandler implements RequestHandler<Map<String, Object>, String> {

@Override public String handleRequest(Map<String, Object> input, Context ctx) { // Lambda handles threading // Don't create custom thread pools

// Use CompletableFuture for async work: CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> { return processAsync(input); });

return future.join(); } } ```

Step 8: Test SnapStart Performance

```bash # Create test script: cat << 'EOF' > test-snapstart.sh #!/bin/bash

FUNCTION="my-function" VERSION="1" ITERATIONS=10

echo "=== Testing SnapStart Performance ==="

echo -e "\nCold Start Tests (with SnapStart):" for i in $(seq 1 $ITERATIONS); do # Force cold start by updating env var slightly aws lambda update-function-configuration \ --function-name $FUNCTION \ --environment Variables={TEST_VAR=$i} > /dev/null

sleep 2

# Wait for SnapStart to create new snapshot aws lambda wait function-updated --function-name $FUNCTION

# Publish new version aws lambda publish-version \ --function-name $FUNCTION \ --description "Test version $i" > /dev/null

VERSION=$((i + 1))

# Invoke with cold start START=$(date +%s%N) aws lambda invoke --function-name $FUNCTION:$VERSION out.json > /dev/null END=$(date +%s%N)

DURATION=$(( ($END - $START) / 1000000 )) echo "Cold start $i: ${DURATION}ms"

sleep 5 done

echo -e "\nWarm Start Tests:" for i in $(seq 1 $ITERATIONS); do START=$(date +%s%N) aws lambda invoke --function-name $FUNCTION:$VERSION out.json > /dev/null END=$(date +%s%N)

DURATION=$(( ($END - $START) / 1000000 )) echo "Warm start $i: ${DURATION}ms" done

echo -e "\nCheck CloudWatch for detailed metrics" EOF

chmod +x test-snapstart.sh

# Run tests: ./test-snapstart.sh

filter @message like /INIT_START

Step 9: Monitor SnapStart Metrics

```bash # CloudWatch metrics for SnapStart:

# 1. Check SnapStart restores: aws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name SnapStartRestoreDuration \ --dimensions Name=FunctionName,Value=my-function \ --start-time $(date -u -d '-1 hour' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Average

# 2. Monitor init duration: aws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name InitDuration \ --dimensions Name=FunctionName,Value=my-function \ --start-time $(date -u -d '-1 hour' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 60 \ --statistics Average

# 3. Create CloudWatch alarm for slow restores: aws cloudwatch put-metric-alarm \ --alarm-name "Lambda-SnapStart-SlowRestore" \ --alarm-description "Alert when SnapStart restore exceeds 500ms" \ --metric-name SnapStartRestoreDuration \ --namespace AWS/Lambda \ --dimensions Name=FunctionName,Value=my-function \ --statistic Average \ --period 60 \ --threshold 500 \ --comparison-operator GreaterThanThreshold \ --evaluation-periods 3 \ --treat-missing-data notBreaching

# 4. Check for snapshot failures: # Look for Lambda.SnapStart.SnapshotTimeout errors in logs aws logs filter-log-events \ --log-group-name /aws/lambda/my-function \ --filter-pattern "SnapStart" \ --start-time $(date -u -d '-1 hour' +%s)000

# 5. Use Lambda Insights for detailed performance: # Enable in function configuration: aws lambda update-function-configuration \ --function-name my-function \ --layers arn:aws:lambda:us-east-1:580247275435:layer:LambdaInsightsExtension:21 ```

Step 10: Implement Best Practices for SnapStart

```java // Complete SnapStart-compatible Lambda function:

import com.amazonaws.services.lambda.runtime.Context; import com.amazonaws.services.lambda.runtime.RequestHandler; import com.amazonaws.services.lambda.runtime.snapstart.SnapStartPreHandler; import software.amazon.awssdk.services.s3.S3Client; import software.amazon.awssdk.enhanced.snapshots.UniqueIdGenerator;

public class ProductionHandler implements RequestHandler<Map<String, Object>, String>, SnapStartPreHandler {

// SnapStart-aware ID generator private final UniqueIdGenerator idGenerator = UniqueIdGenerator.builder().enablePrefetching(true).build();

// Lazy-loaded configuration private volatile String config; private final Object configLock = new Object();

// AWS SDK client (SnapStart-aware) private final S3Client s3 = S3Client.create();

// Connection pool with validation private volatile HikariDataSource dataSource;

@Override public void beforeCheckpoint() { // Cleanup before snapshot closeDatabaseConnections(); }

@Override public void afterRestore() { // Reinitialize after restore initializeConnections(); }

@Override public String handleRequest(Map<String, Object> input, Context ctx) { // Ensure initialization ensureInitialized();

// Generate unique ID String requestId = idGenerator.generate();

// Process request return processRequest(input, requestId, ctx); }

private void ensureInitialized() { if (config == null) { synchronized (configLock) { if (config == null) { config = loadConfig(); } } } }

private void closeDatabaseConnections() { HikariDataSource ds = dataSource; if (ds != null && !ds.isClosed()) { ds.close(); dataSource = null; } }

private void initializeConnections() { if (dataSource == null || dataSource.isClosed()) { HikariConfig config = new HikariConfig(); config.setJdbcUrl(System.getenv("DB_URL")); config.setMaximumPoolSize(5); config.setConnectionTimeout(5000); config.setValidationTimeout(1000); config.setTestOnBorrow(true); dataSource = new HikariDataSource(config); } } }

// deployment.yaml for CloudFormation: AWSTemplateFormatVersion: '2010-09-09' Description: Lambda function with SnapStart

Resources: MyFunction: Type: AWS::Lambda::Function Properties: FunctionName: my-function Runtime: java17 MemorySize: 1024 Timeout: 30 Handler: com.example.ProductionHandler Code: S3Bucket: my-bucket S3Key: function.jar SnapStart: ApplyOn: PublishedVersions Environment: Variables: DB_URL: !Ref DatabaseUrl

MyFunctionVersion: Type: AWS::Lambda::Version Properties: FunctionName: !Ref MyFunction Description: SnapStart enabled version

Outputs: FunctionArn: Value: !GetAtt MyFunction.Arn VersionArn: Value: !Ref MyFunctionVersion ```

AWS Lambda SnapStart Checklist

CheckCommandExpected
Runtimeget-function-configjava11 or java17
Architectureget-function-configx86_64
SnapStart enabledget-function-configApplyOn: PublishedVersions
Version publishedlist-versionsVersion exists
Snapshot createdCloudWatch logsRESTORE_START logged
Restore timeCloudWatch logs< 100ms
Unique IDstestDifferent after restore

Verify the Fix

```bash # After implementing SnapStart correctly:

# 1. Publish new version aws lambda publish-version --function-name my-function # Output: Version "2"

# 2. Invoke cold start aws lambda invoke --function-name my-function:2 response.json

# 3. Check CloudWatch logs for: # - INIT_START # - SNAPSHOT_START (if creating new) # - RESTORE_START (if using snapshot) # - RESTORE_DURATION: < 100ms # - INVOKE_START

# 4. Compare performance: # Before SnapStart: INIT_DURATION: 5000-10000ms # After SnapStart: RESTORE_DURATION: 20-80ms

# 5. Test unique IDs for i in {1..10}; do aws lambda invoke --function-name my-function:2 out.json --payload '{"test":true}' cat out.json | jq '.requestId' done # Each should be unique

# 6. Monitor in production: aws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name SnapStartRestoreDuration \ --dimensions Name=FunctionName,Value=my-function \ --start-time $(date -u -d '-1 day' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 300 \ --statistics Average

# Verify latency improvement: # Before: 5000-10000ms cold start # After: 20-100ms restore time ```

  • [Fix AWS Lambda Cold Start](/articles/fix-aws-lambda-cold-start)
  • [Fix AWS Lambda Timeout](/articles/fix-aws-lambda-timeout)
  • [Fix AWS Lambda Memory Issues](/articles/fix-aws-lambda-memory-issues)