Introduction

Java OutOfMemoryError: Java heap space occurs when the JVM cannot allocate memory for new objects because the heap is full and garbage collection cannot free sufficient space. This error indicates either insufficient heap allocation for the workload or a memory leak where objects are retained longer than intended. In production environments, this causes application crashes, service outages, and data loss if not handled properly. The error typically manifests after the application has been running for hours or days, making diagnosis challenging without proper monitoring and heap analysis tools.

Symptoms

  • Application crashes with java.lang.OutOfMemoryError: Java heap space in logs
  • JVM terminates with exit code 134 or 1
  • GC logs show Full GC (Allocation Failure) with minimal memory freed
  • Application slows down significantly before crash (GC thrashing)
  • Heap usage stays near maximum even after Full GC
  • Issue appears after traffic increase, new feature deploy, or dataset growth
  • Error occurs at consistent intervals (daily, weekly) suggesting accumulation

Common Causes

  • Memory leak: Objects retained indefinitely (static collections, unclosed resources)
  • Insufficient heap size for workload (Xmx too low)
  • Large in-memory caches without eviction policy
  • Unbounded session storage or request buffering
  • Memory leak in third-party libraries or frameworks
  • GC algorithm misconfigured for workload pattern
  • Native memory leak causing heap reduction (compressed oops issue)

Step-by-Step Fix

### 1. Enable GC and OOM diagnostic logging

Configure JVM to capture diagnostic information:

```bash # Add to JVM startup options

# Java 8 -Xloggc:/var/log/app/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/app/heapdump.hprof -XX:OnOutOfMemoryError="kill -9 %p"

# Java 11+ (Unified GC Logging) -Xlog:gc*,gc+heap=debug:file=/var/log/app/gc.log:time,uptime,level,tags -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/app/heapdump.hprof

# Verify GC logging is active jcmd <pid> VM.command_line | grep -i gc ```

Key GC log metrics: - GC time: Time spent in garbage collection - Heap before/after: Memory freed by GC - GC frequency: How often GC runs - Pause time: Application stop-the-world duration

### 2. Analyze GC logs for memory patterns

Use GC log analysis tools:

```bash # Use gceasy.io (online analyzer) # Upload gc.log to https://gceasy.io/

# Or use local tools # Install gclib pip install gclib

# Analyze GC frequency grep "Full GC" gc.log | awk '{print $1, $2}' | uniq -c | head -20

# Check heap usage pattern grep "Heap" gc.log | tail -50

# Look for memory leak indicators # - Full GC running frequently (every few minutes) # - Old generation not being freed effectively # - GC time increasing over application lifetime ```

Memory leak indicators in GC logs:

``` # Healthy GC pattern: [2026-03-31T10:00:00] Full GC 2048M->512M(4096M) - Freed 1536M in 234ms [2026-03-31T10:30:00] Full GC 2100M->520M(4096M) - Freed 1580M in 241ms

# Memory leak pattern: [2026-03-31T10:00:00] Full GC 3800M->3200M(4096M) - Freed 600M in 891ms [2026-03-31T10:15:00] Full GC 3900M->3500M(4096M) - Freed 400M in 1234ms [2026-03-31T10:30:00] Full GC 4000M->3700M(4096M) - Freed 300M in 2341ms # Notice: Less memory freed each time, GC taking longer ```

### 3. Capture and analyze heap dump

Heap dump shows what objects are consuming memory:

```bash # Heap dump is auto-generated if HeapDumpOnOutOfMemoryError is set

# Or manually trigger heap dump jmap -dump:format=b,file=/tmp/heap.hprof <pid>

# Or use jcmd (recommended for large heaps) jcmd <pid> GC.heap_dump /tmp/heap.hprof

# For Kubernetes pods kubectl exec <pod-name> -- jcmd 1 GC.heap_dump /tmp/heap.hprof kubectl cp <pod-name>:/tmp/heap.hprof ./heap.hprof ```

Analyze heap dump with Eclipse MAT:

```bash # Download Eclipse MAT: https://www.eclipse.org/mat/downloads.php

# Open heap dump in MAT # File > Open Heap Dump > Select heap.hprof

# Run Leak Suspects Report # Window > Preference > Memory Analysis > Leak Suspects Report # Right-click heap dump > Leak Suspects Report

# Key views in MAT: # 1. Dominator Tree: Shows objects retaining most memory # 2. Histogram: Object count by class # 3. OOM Leaks: Automatic leak detection # 4. Top Consumers: Largest memory consumers ```

MAT query language (OQL) examples:

```sql -- Find largest objects SELECT * FROM java.lang.Object[] ORDER BY used_heap DESC LIMIT 100

-- Find all HashMaps with size > 1000 SELECT * FROM java.util.HashMap WHERE size > 1000

-- Find objects retained by specific class SELECT * FROM com.example.MyClass.*

-- Find duplicate strings SELECT toString(s) AS value, COUNT(*) AS count FROM java.lang.String s GROUP BY toString(s) HAVING COUNT(*) > 10 ORDER BY count DESC ```

### 4. Identify memory leak sources

Common memory leak patterns:

```java // Pattern 1: Static collection growing unbounded public class CacheLeak { private static final List<Object> cache = new ArrayList<>();

public void add(Object obj) { cache.add(obj); // Never removed, grows forever } }

// Fix: Use bounded cache with eviction private static final Cache<String, Object> cache = CacheBuilder.newBuilder() .maximumSize(10000) .expireAfterWrite(1, TimeUnit.HOURS) .build();

// Pattern 2: Unclosed resources public void readFiles(List<String> files) { for (String file : files) { FileInputStream fis = new FileInputStream(file); // If exception occurs, stream never closed process(fis); } }

// Fix: Use try-with-resources public void readFiles(List<String> files) { for (String file : files) { try (FileInputStream fis = new FileInputStream(file)) { process(fis); } catch (IOException e) { log.error("Failed: " + file, e); } } }

// Pattern 3: ThreadLocal not removed private static final ThreadLocal<byte[]> buffer = ThreadLocal.withInitial(() -> new byte[1024 * 1024]);

public void process() { byte[] buf = buffer.get(); // 1MB per thread // Never removed, thread pool keeps threads alive }

// Fix: Remove after use public void process() { try { byte[] buf = buffer.get(); // use buffer } finally { buffer.remove(); // Critical for thread pools } }

// Pattern 4: Listeners/callbacks not unregistered public class EventBusLeak { private final EventBus eventBus = EventBus.getDefault();

public void init() { eventBus.register(this); // Registered but never unregistered }

// Fix: Unregister when done public void destroy() { eventBus.unregister(this); } } ```

### 5. Tune JVM heap configuration

Set appropriate heap sizes:

```bash # Calculate heap based on container memory # Rule of thumb: Heap = 50-75% of container memory

# Kubernetes example (container: 4GB) # resources: # limits: # memory: "4Gi" # requests: # memory: "2Gi"

# JVM options: -Xms2g # Initial heap (set equal to -Xmx) -Xmx3g # Max heap (75% of 4GB) -XX:MaxRAMPercentage=75.0 # Alternative: percentage of container memory

# For Java 8u191+ in containers -XX:+UseContainerSupport # Enabled by default in Java 10+ -XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=50.0

# Heap sizing by workload type: # - Web application: 2-4GB typical # - Batch processing: 4-8GB or more # - In-memory cache: 8-16GB (consider off-heap) # - Microservice: 512MB-2GB ```

GC algorithm selection:

```bash # Java 8 options:

# G1 GC (recommended for most workloads) -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1HeapRegionSize=16m -XX:G1ReservePercent=10

# Parallel GC (throughput-focused) -XX:+UseParallelGC -XX:ParallelGCThreads=8 -XX:MaxGCPauseMillis=100

# Serial GC (small heaps < 1GB) -XX:+UseSerialGC

# Java 11+ (ZGC for low latency) -XX:+UseZGC # Sub-millisecond pauses, Java 15+

# Java 17+ (Generational ZGC) -XX:+UseZGC -XX:+ZGenerational ```

### 6. Implement memory monitoring

Add runtime memory monitoring:

```java // Spring Boot Actuator metrics // application.yml management: endpoints: web: exposure: include: health,metrics,prometheus metrics: export: prometheus: enabled: true

// Key JVM memory metrics: // jvm_memory_used_bytes{area="heap"} // jvm_memory_committed_bytes{area="heap"} // jvm_memory_max_bytes{area="heap"} // jvm_gc_pause_seconds // jvm_gc_memory_allocated_bytes_total

// Programmatic memory check @Component public class MemoryHealthIndicator implements HealthIndicator {

private final MemoryMXBean memoryMXBean;

public MemoryHealthIndicator() { this.memoryMXBean = ManagementFactory.getMemoryMXBean(); }

@Override public Health health() { MemoryUsage heapUsage = memoryMXBean.getHeapMemoryUsage(); long used = heapUsage.getUsed(); long max = heapUsage.getMax(); double usagePercent = (double) used / max * 100;

if (usagePercent > 90) { return Health.down() .withDetail("heap_used", used) .withDetail("heap_max", max) .withDetail("usage_percent", usagePercent) .build(); }

return Health.up() .withDetail("heap_used", used) .withDetail("heap_max", max) .withDetail("usage_percent", usagePercent) .build(); } } ```

Prometheus alerting rules:

```yaml groups: - name: java_memory rules: - alert: JavaHeapHigh expr: jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} > 0.85 for: 5m labels: severity: warning annotations: summary: "Java heap usage above 85%" description: "Heap usage is {{ $value | humanizePercentage }}"

  • alert: JavaHeapCritical
  • expr: jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} > 0.95
  • for: 2m
  • labels:
  • severity: critical
  • annotations:
  • summary: "Java heap usage above 95%"
  • description: "Heap usage is {{ $value | humanizePercentage }} - OOM imminent"
  • alert: JavaGCPausesHigh
  • expr: rate(jvm_gc_pause_seconds_sum[5m]) / rate(jvm_gc_pause_seconds_count[5m]) > 0.5
  • for: 5m
  • labels:
  • severity: warning
  • annotations:
  • summary: "Java GC pause time averaging > 500ms"
  • `

### 7. Use profiling tools for memory analysis

Production-safe profiling:

```bash # Async Profiler (low overhead, production-safe) # Download: https://github.com/jvm-profiling-tools/async-profiler

# Start CPU + allocation profiling ./profiler.sh -e alloc -d 60 -f alloc.html <pid>

# Start memory profiling with heap dump ./profiler.sh -d 120 --heap <pid>

# Generate flame graph ./profiler.sh -e alloc -f alloc-flame.svg <pid>

# JFR (Java Flight Recorder) - built into Java 11+ # Start recording jcmd <pid> JFR.start name=memory duration=5m settings=profile filename=/tmp/recording.jfr

# Analyze with JDK Mission Control jmc /tmp/recording.jfr

# Key JFR views: # - Memory > Heap Usage # - GC > Heap Statistics # - Profiling > Allocation Stack Trace ```

VisualVM for local debugging:

```bash # Launch VisualVM jvisualvm

# Connect to local or remote JVM # File > Add JMX Connection

# Key tabs: # - Memory: Real-time heap usage # - Threads: Thread activity # - Profiler: CPU and memory profiling # - Heap Dump: Capture and analyze

# Remote profiling setup: # -Dcom.sun.management.jmxremote # -Dcom.sun.management.jmxremote.port=9010 # -Dcom.sun.management.jmxremote.authenticate=false # -Dcom.sun.management.jmxremote.ssl=false ```

### 8. Implement bounded caching

Replace unbounded caches with bounded alternatives:

```java // WRONG: Unbounded cache private static final Map<String, Object> cache = new ConcurrentHashMap<>();

public void put(String key, Object value) { cache.put(key, value); // Grows forever }

// CORRECT: Bounded cache with Caffeine private static final Cache<String, Object> cache = Caffeine.newBuilder() .maximumSize(10000) .expireAfterWrite(1, TimeUnit.HOURS) .expireAfterAccess(30, TimeUnit.MINUTES) .recordStats() .build();

// Spring Cache abstraction @Configuration @EnableCaching public class CacheConfig {

@Bean public CacheManager cacheManager() { CaffeineCacheManager cacheManager = new CaffeineCacheManager(); cacheManager.setCaffeine(Caffeine.newBuilder() .maximumSize(10000) .expireAfterWrite(1, TimeUnit.HOURS)); return cacheManager; } }

// LRU cache implementation public class LRUCache<K, V> extends LinkedHashMap<K, V> { private final int maxSize;

public LRUCache(int maxSize) { super(16, 0.75f, true); // Access order this.maxSize = maxSize; }

@Override protected boolean removeEldestEntry(Map.Entry<K, V> eldest) { return size() > maxSize; // Automatically remove oldest } } ```

### 9. Check for native memory issues

Native memory can reduce available heap:

```bash # Check native memory tracking # Add to JVM options -XX:NativeMemoryTracking=detail

# View native memory usage jcmd <pid> VM.native_memory summary

# Output: # Native Memory Tracking: # Total: reserved=8589934592KB, committed=4294967296KB # - Java Heap (reserved=4294967296KB) # - Stack (reserved=10240KB) # - Code (reserved=256000KB) # - GC (reserved=512000KB) # - Compiler (reserved=128MB) # - Internal (reserved=64MB) # - Other (reserved=128MB) # - Thread (reserved=20480KB) # # If Native memory is high: # - Reduce Metaspace: -XX:MaxMetaspaceSize=256m # - Reduce thread count: -XX:MaxRAMPercentage # - Check for direct ByteBuffer leaks ```

Direct ByteBuffer leak detection:

```java // Monitor direct memory usage MemoryMXBean memoryMXBean = ManagementFactory.getMemoryMXBean(); BufferPoolMXBean bufferPoolMXBean = ManagementFactory.getPlatformMXBeans( BufferPoolMXBean.class).get(0);

long directMemoryUsed = bufferPoolMXBean.getMemoryUsed(); long directBufferCount = bufferPoolMXBean.getCount();

log.info("Direct memory: {} bytes, {} buffers", directMemoryUsed, directBufferCount);

// If direct memory leak suspected: // - Check for unclosed Channels // - Check for unclosed ByteBuffers // - Use -XX:NativeMemoryTracking=detail ```

### 10. Implement graceful degradation

Prevent OOM from crashing application:

```java // Memory-aware request handling @Component public class MemoryAwareHandler {

private final MemoryMXBean memoryMXBean; private static final double CRITICAL_THRESHOLD = 0.95;

public MemoryAwareHandler() { this.memoryMXBean = ManagementFactory.getMemoryMXBean(); }

public ResponseEntity<?> handleRequest(Request request) { MemoryUsage usage = memoryMXBean.getHeapMemoryUsage(); double usageRatio = (double) usage.getUsed() / usage.getMax();

if (usageRatio > CRITICAL_THRESHOLD) { // Return 503 instead of crashing return ResponseEntity .status(503) .body("Service temporarily unavailable - memory pressure"); }

// Process request return process(request); } }

// Circuit breaker for memory pressure @Configuration public class MemoryCircuitBreaker {

@Bean public Customizer<Resilience4JCircuitBreakerFactory> memoryCircuitBreaker() { return factory -> factory.configureDefault(id -> id .with(() -> CircuitBreakerConfig.custom() .failureRateThreshold(50) .slidingWindowSize(10) .recordExceptions(OutOfMemoryError.class) .build())); } }

// Graceful shutdown on OOM -XX:OnOutOfMemoryError="/app/scripts/oom-handler.sh %p"

// oom-handler.sh #!/bin/bash PID=$1 log "OutOfMemoryError detected in process $PID"

# Capture diagnostic info before kill jstack $PID > /var/log/app/jstack-$PID-$(date +%Y%m%d).txt 2>&1 jmap -dump:format=b,file=/var/log/app/heap-$PID-$(date +%Y%m%d).hprof $PID

# Notify monitoring curl -X POST http://alertmanager:9093/api/v1/alerts \ -d "{\"alerts\":[{\"labels\":{\"severity\":\"critical\",\"alert\":\"JavaOOM\"}}]}"

# Kill process kill -9 $PID ```

Prevention

  • Enable HeapDumpOnOutOfMemoryError in all environments
  • Set up memory monitoring with Prometheus/Grafana
  • Use bounded caches with eviction policies
  • Implement memory health checks in CI/CD
  • Conduct load testing with memory profiling
  • Document memory requirements for each service
  • Review heap dump analysis as part of incident post-mortems
  • **OutOfMemoryError: Metaspace**: Class metadata exhausted
  • **OutOfMemoryError: GC overhead limit exceeded**: GC running >98% of time
  • **OutOfMemoryError: Unable to create new native thread**: Thread limit reached
  • **OutOfMemoryError: Direct buffer memory**: Direct ByteBuffer exhausted