Introduction

Java GC Overhead Limit Exceeded (java.lang.OutOfMemoryError: GC overhead limit exceeded) occurs when the JVM spends more than 98% of its time performing garbage collection but reclaims less than 2% of the heap. This error is a protective mechanism - the JVM detects that continued execution would be futile since all CPU time is consumed by GC with no meaningful work being done. Unlike simple heap exhaustion, this error indicates a fundamental mismatch between memory allocation patterns and available heap, often caused by memory leaks, aggressive object creation, or severely undersized heap.

Symptoms

  • Application throws OutOfMemoryError: GC overhead limit exceeded
  • Application becomes extremely slow before crashing (GC thrashing)
  • CPU usage near 100% but throughput drops to near zero
  • GC logs show Full GC running every few seconds
  • Heap usage returns to near-maximum immediately after GC completes
  • Response times increase 10-100x before crash
  • Issue appears gradually as workload increases over time

Common Causes

  • Memory leak causing heap to fill faster than GC can reclaim
  • Heap size too small for working set
  • Creating大量 short-lived objects (allocation storm)
  • Large object graph preventing efficient GC
  • Finalizer queue backlog (objects with finalizers)
  • Weak/Soft references not being cleared fast enough
  • JNI references preventing object collection

Step-by-Step Fix

### 1. Confirm GC overhead diagnosis

Distinguish from other OOM errors:

```bash # Check error message in logs grep -E "GC overhead|OutOfMemoryError" /var/log/app/*.log

# Expected output: # java.lang.OutOfMemoryError: GC overhead limit exceeded

# This is different from: # - Java heap space (simple heap exhaustion) # - Metaspace (class metadata exhausted) # - Unable to create new native thread (OS thread limit) # - Requested array size exceeds VM limit (corrupt heap)

# Verify with jstat before crash jstat -gcutil <pid> 1000

# Watch for GC overhead pattern: # YGC YGCT FGC FGCT GCT # 1000 12.5 500 87.5 100.0 (FGCT/GCT > 98%)

# Calculate GC overhead percentage # FGCT / GCT * 100 = percentage of time in Full GC # If > 98%, GC overhead limit will trigger ```

### 2. Analyze GC logs for root cause

Extract GC patterns from logs:

```bash # Java 8 GC log example # Parse GC frequency and efficiency

grep "Full GC" gc.log | awk '{ # Extract heap before and after GC match($0, /([0-9]+)K->([0-9]+)K/, arr); before = arr[1]; after = arr[2]; freed = before - after; percent = (freed / before) * 100;

print $1, $2, "Before:", before, "After:", after, "Freed:", freed, "Efficiency:", percent "%"; }'

# Healthy GC pattern: # 10:00:00 Full GC 2048000K->512000K Freed: 1536000K Efficiency: 75% # 10:05:00 Full GC 2100000K->520000K Freed: 1580000K Efficiency: 75%

# GC overhead pattern: # 10:00:00 Full GC 3900000K->3800000K Freed: 100000K Efficiency: 2.5% # 10:00:05 Full GC 3950000K->3850000K Freed: 100000K Efficiency: 2.5% # 10:00:10 Full GC 3980000K->3880000K Freed: 100000K Efficiency: 2.5% # Notice: Very little memory freed, GC running constantly

# Check GC pause times grep "Full GC" gc.log | awk '{ match($0, /([0-9.]+)ms/, arr); print "Pause:", arr[1], "ms"; }' | sort -n | tail -20

# Long GC pauses (>10 seconds) indicate heap issues ```

Use GC analysis tools:

```bash # gceasy.io - Upload gc.log for detailed analysis # https://gceasy.io/

# Key metrics to check: # - GC Frequency: How often GC runs # - GC Efficiency: Memory freed per GC # - GC Pause Time: Application downtime # - Stop The World time: Total application pause

# Or use local tools # Install gclog parser pip install gclog-parser

# Analyze with Python python3 -c " import gclogparser with open('gc.log') as f: events = gclogparser.parse(f) for event in events: if event.type == 'Full GC': print(f'{event.timestamp}: {event.heap_before} -> {event.heap_after}') " ```

### 3. Identify memory leak with heap dump

Capture heap before crash:

```bash # Auto-generate heap dump on OOM -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/app/heapdump.hprof -XX:OnOutOfMemoryError="jcmd %p GC.heap_dump /var/log/app/forced.hprof"

# Or manually trigger when you see GC thrashing # Warning: This may cause application to hang temporarily jcmd <pid> GC.heap_dump /tmp/heap.hprof

# For large heaps (>8GB), use compressed format jcmd <pid> GC.heap_dump -gz /tmp/heap.hprof.gz

# Kubernetes pod kubectl exec <pod-name> -- jcmd 1 GC.heap_dump /tmp/heap.hprof kubectl cp <pod-name>:/tmp/heap.hprof ./heap.hprof ```

Analyze heap dump for leak patterns:

```bash # Open in Eclipse MAT and run queries

# 1. Find memory leak suspects # Right-click heap dump > Leak Suspects Report

# 2. Find largest objects SELECT * FROM java.lang.Object[] ORDER BY used_heap DESC LIMIT 50

# 3. Find objects with many instances SELECT toString(class), COUNT(*) as count FROM java.lang.Object GROUP BY toString(class) ORDER BY count DESC LIMIT 50

# 4. Find GC roots retaining large objects # Right-click object > Path to GC Roots > Exclude all phantom/weak references

# 5. Find duplicate strings (common leak) SELECT toString(s) AS value, COUNT(*) AS count FROM java.lang.String s GROUP BY toString(s) HAVING COUNT(*) > 100 ORDER BY count DESC ```

Common leak patterns in heap dump:

``` Leak Pattern 1: Growing ArrayList/HashMap java.util.ArrayList (2.1 GB / 45% of heap) - internal array: byte[] (2.0 GB) - Retained by: com.example.CacheManager.cache - 50,000,000 items, never cleared

Leak Pattern 2: Unclosed resources java.io.FileInputStream (500,000 instances) - Each holding file descriptor - Retained by: ThreadLocal in RequestProcessor

Leak Pattern 3: Event listeners java.util.ArrayList (1.5 GB) - Contains: com.example.EventListener[] - Listeners registered but never unregistered - Each listener holds reference to entire UI component tree ```

### 4. Tune GC algorithm for workload

Select appropriate GC algorithm:

```bash # G1 GC (Recommended for most applications) # Best for: Large heaps (>4GB), predictable pause times

-XX:+UseG1GC -XX:MaxGCPauseMillis=200 # Target max pause time -XX:G1HeapRegionSize=16m # Region size (1-32MB) -XX:G1ReservePercent=10 # Reserve 10% for evacuation -XX:G1NewSizePercent=30 # Young gen as % of heap -XX:G1MaxNewSizePercent=60 # Max young gen size -XX:ParallelGCThreads=8 # Parallel GC threads -XX:ConcGCThreads=2 # Concurrent GC threads -XX:InitiatingHeapOccupancyPercent=45 # Start mixed GC at 45%

# For GC overhead specifically: -XX:G1HeapWastePercent=10 # Allow 10% wasted space before mixed GC -XX:G1MixedGCCountTarget=8 # Number of mixed GCs in cycle -XX:G1MixedGCLIveThresholdPercent=85 # Include regions with <85% live data

# Parallel GC (Throughput-focused, older JVMs) # Best for: Batch processing, scientific computing

-XX:+UseParallelGC -XX:ParallelGCThreads=8 -XX:MaxGCPauseMillis=100 -XX:GCTimeRatio=99 # Target 99% throughput (1% GC time) -XX:AdaptiveSizePolicyWeight=90

# If GCTimeRatio too high, may cause GC overhead # Reduce to 90 if experiencing GC overhead

# ZGC (Java 15+, low latency) # Best for: Low-latency applications, large heaps

-XX:+UseZGC -XX:ZCollectionInterval=5 # Minimum time between GCs (seconds) -XX:ZAllocationSpikeTolerance=2.0 # Handle allocation spikes -XX:ConcGCThreads=4 # Concurrent threads -XX:MaxGCPauseMillis=10 # Target pause <10ms

# For GC overhead with ZGC: # ZGC rarely hits GC overhead due to concurrent operation # If it does, heap is severely undersized ```

### 5. Adjust heap configuration

Size heap appropriately:

```bash # Rule of thumb: Heap should be 2-4x working set size

# Calculate working set from GC logs # Average heap usage after Full GC = working set

# If working set is 2GB: # Minimum heap: 4GB (2x working set) # Recommended heap: 6-8GB (3-4x working set)

# Production configuration -Xms6g # Initial heap = Max heap (avoid resizing) -Xmx6g # Max heap

# For containerized deployments # Kubernetes with 8GB limit: -XX:InitialRAMPercentage=75.0 # 6GB initial -XX:MaxRAMPercentage=75.0 # 6GB max -XX:+UseContainerSupport # Respect container limits (Java 8u191+)

# Don't set heap too close to container limit # Leave room for: Metaspace, Code Cache, Thread stacks, Direct buffers

# Container memory = Heap + Non-heap # Non-heap typically 500MB-1GB

# For 8GB container: # Heap: 6GB (75%) # Non-heap: ~1GB # Headroom: ~1GB ```

Disable GC overhead limit (not recommended):

```bash # ONLY for debugging, NOT for production! # This allows JVM to continue running despite GC overhead # Application will be extremely slow but won't crash

-XX:-UseGCOverheadLimit

# Use this to: # - Capture better diagnostics before eventual crash # - Allow application to drain requests gracefully

# NEVER use this as a "fix" - it masks the underlying problem ```

### 6. Fix memory allocation patterns

Reduce object allocation rate:

```java // WRONG: Creating millions of temporary objects public List<String> processData(List<String> input) { List<String> result = new ArrayList<>();

for (String item : input) { // Creates new StringBuilder for each iteration String processed = new StringBuilder(item) .append("-processed") .toString(); result.add(processed); }

return result; // All intermediate objects go to GC }

// CORRECT: Reuse StringBuilder public List<String> processData(List<String> input) { List<String> result = new ArrayList<>(input.size()); StringBuilder sb = new StringBuilder(64); // Reuse same builder

for (String item : input) { sb.setLength(0); // Clear builder sb.append(item).append("-processed"); result.add(sb.toString()); }

return result; }

// WRONG: Boxing/unboxing in loops public long sumList(List<Integer> numbers) { long sum = 0; for (Integer n : numbers) { // Auto-unboxing creates objects sum += n; // Each iteration boxes/unboxes } return sum; }

// CORRECT: Use primitive arrays public long sumArray(long[] numbers) { long sum = 0; for (long n : numbers) { sum += n; // No boxing } return sum; }

// WRONG: String concatenation in loop public String buildMessage(List<String> parts) { String message = ""; for (String part : parts) { message += part + ", "; // Creates new String each iteration } return message; }

// CORRECT: Use StringBuilder public String buildMessage(List<String> parts) { StringBuilder sb = new StringBuilder(parts.size() * 20); for (String part : parts) { sb.append(part).append(", "); } return sb.toString(); } ```

Use object pooling for high-allocation scenarios:

```java // For objects created/destroyed frequently // Use Apache Commons Pool

public class ConnectionPool { private final GenericObjectPool<Connection> pool;

public ConnectionPool() { GenericObjectPoolConfig<Connection> config = new GenericObjectPoolConfig<>(); config.setMaxTotal(50); config.setMaxIdle(25); config.setMinIdle(5); config.setBlockWhenExhausted(true);

pool = new GenericObjectPool<>(new ConnectionFactory(), config); }

public Connection borrow() throws Exception { return pool.borrowObject(); }

public void returnConnection(Connection conn) { pool.returnObject(conn); } }

// For byte arrays, use Netty's Recycler public class BufferPool { private static final Recycler<byte[]> RECYCLER = new Recycler<byte[]>() { @Override protected byte[] newObject(Handle<byte[]> handle) { return new byte[8192]; } };

public static byte[] acquire() { return RECYCLER.get(); }

public static void release(byte[] buffer, Handle<?> handle) { handle.recycle(buffer); } } ```

### 7. Clear finalizer backlog

Objects with finalizers can block GC:

```bash # Check finalizer queue size jcmd <pid> GC.finalizer_info

# Or with JMX jconsole <pid> > Memory > Finalizer Queue Size

# If queue size is growing, finalizers can't keep up ```

Fix finalizer issues:

```java // WRONG: Relying on finalizers for cleanup public class LeakyResource { private final FileInputStream stream;

public LeakyResource(String path) throws FileNotFoundException { this.stream = new FileInputStream(path); // No explicit close - relies on finalizer }

@Override protected void finalize() { // Finalizer may not run for hours // Objects accumulate in finalizer queue stream.close(); } }

// CORRECT: Use try-with-resources public class SafeResource implements AutoCloseable { private final FileInputStream stream;

public SafeResource(String path) throws FileNotFoundException { this.stream = new FileInputStream(path); }

@Override public void close() throws IOException { stream.close(); } }

// Usage try (SafeResource resource = new SafeResource("file.txt")) { // Use resource } // Automatically closed

// Avoid PhantomReference for cleanup // Use Cleaner instead (Java 9+)

public class SafeCleanup implements Runnable { private final Cleaner.Cleanable cleanable;

public SafeCleanup() { Cleaner cleaner = Cleaner.create(); cleanable = cleaner.register(this, () -> { // Cleanup code - runs when object is GC'd releaseNativeResource(); }); }

public void close() { cleanable.clean(); // Explicit cleanup }

private void releaseNativeResource() { // Release native resources } } ```

### 8. Implement circuit breaker for memory pressure

Prevent memory exhaustion from cascading:

```java @Component public class MemoryCircuitBreaker {

private final MemoryMXBean memoryMXBean; private volatile boolean circuitOpen = false; private volatile long lastCheckTime = 0;

public MemoryCircuitBreaker() { this.memoryMXBean = ManagementFactory.getMemoryMXBean(); }

public boolean canAcceptRequest() { // Check every 5 seconds long now = System.currentTimeMillis(); if (now - lastCheckTime < 5000) { return !circuitOpen; }

lastCheckTime = now; MemoryUsage usage = memoryMXBean.getHeapMemoryUsage(); double usageRatio = (double) usage.getUsed() / usage.getMax();

if (usageRatio > 0.90) { circuitOpen = true; log.warn("Memory circuit OPEN - heap at {:.1f}%", usageRatio * 100); return false; } else if (usageRatio < 0.70) { circuitOpen = false; log.info("Memory circuit CLOSED - heap at {:.1f}%", usageRatio * 100); return true; }

return !circuitOpen; }

@EventListener public void handleRequest(RequestEvent event) { if (!canAcceptRequest()) { event.reject("Service temporarily unavailable - memory pressure"); return; }

// Process request } }

// Spring Cloud Circuit Breaker integration @Configuration public class MemoryCircuitBreakerConfig {

@Bean public Customizer<Resilience4JCircuitBreakerFactory> memoryCircuitBreaker() { return factory -> factory.configureDefault(id -> id .with(() -> CircuitBreakerConfig.custom() .failureRateThreshold(50) .slidingWindowSize(10) .minimumNumberOfCalls(5) .recordExceptions(OutOfMemoryError.class, GCOverheadLimitExceeded.class) .waitDurationInOpenState(Duration.ofMinutes(5)) .build())); } } ```

### 9. Monitor for early detection

Set up proactive monitoring:

```java // Scheduled memory health check @Component public class MemoryHealthMonitor {

private final MemoryMXBean memoryMXBean; private final ApplicationEventPublisher publisher;

@Scheduled(fixedRate = 30000) // Every 30 seconds public void checkMemoryHealth() { MemoryUsage usage = memoryMXBean.getHeapMemoryUsage(); double usedPercent = (double) usage.getUsed() / usage.getMax() * 100;

if (usedPercent > 85) { log.warn("Heap usage above 85%: {}MB / {}MB", usage.getUsed() / 1024 / 1024, usage.getMax() / 1024 / 1024); publisher.publishEvent(new MemoryWarningEvent(this, usedPercent)); }

if (usedPercent > 95) { log.error("Heap usage CRITICAL: {}MB / {}MB", usage.getUsed() / 1024 / 1024, usage.getMax() / 1024 / 1024); publisher.publishEvent(new MemoryCriticalEvent(this, usedPercent)); } } } ```

Prometheus alerting rules:

```yaml groups: - name: java_gc rules: - alert: JavaGCHighFrequency expr: rate(jvm_gc_collection_seconds_count[5m]) > 1 for: 5m labels: severity: warning annotations: summary: "Java GC running more than once per minute" description: "GC frequency is {{ $value | humanize }} per second"

  • alert: JavaGCHighOverhead
  • expr: rate(jvm_gc_collection_seconds_sum[5m]) / 300 > 0.5
  • for: 5m
  • labels:
  • severity: critical
  • annotations:
  • summary: "Java GC consuming > 50% of CPU time"
  • description: "GC overhead is {{ $value | humanizePercentage }}"
  • alert: JavaHeapHigh
  • expr: jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} > 0.85
  • for: 10m
  • labels:
  • severity: warning
  • annotations:
  • summary: "Java heap usage above 85%"
  • alert: JavaHeapNotReclaiming
  • expr: |
  • delta(jvm_memory_used_bytes{area="heap"}[10m]) < 0
  • and
  • rate(jvm_gc_collection_seconds_sum[10m]) > 60
  • for: 10m
  • labels:
  • severity: critical
  • annotations:
  • summary: "Java heap not being reclaimed despite GC"
  • description: "Possible memory leak - GC running but heap not decreasing"
  • `

### 10. Implement graceful degradation

Handle memory pressure without crashing:

```java // Graceful shutdown on memory pressure @Component public class GracefulMemoryShutdown {

private final MemoryMXBean memoryMXBean; private final ApplicationEventPublisher publisher; private volatile boolean shutdownInitiated = false;

public GracefulMemoryShutdown() { this.memoryMXBean = ManagementFactory.getMemoryMXBean(); }

@Scheduled(fixedRate = 10000) public void monitorMemory() { if (shutdownInitiated) return;

MemoryUsage usage = memoryMXBean.getHeapMemoryUsage(); double usageRatio = (double) usage.getUsed() / usage.getMax();

if (usageRatio > 0.95) { shutdownInitiated = true; log.error("Critical memory pressure ({}%) - initiating graceful shutdown", usageRatio * 100);

// Publish event for graceful shutdown publisher.publishEvent(new ContextClosedEvent(this));

// Or trigger shutdown directly new Thread(() -> { try { // Give time for in-flight requests to complete Thread.sleep(30000); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } System.exit(1); }, "memory-shutdown").start(); } } }

// Drop low-priority work under memory pressure @Component public class AdaptiveWorkloadManager {

private volatile WorkloadPriority currentPriority = WorkloadPriority.ALL;

@EventListener public void handleMemoryWarning(MemoryWarningEvent event) { if (event.getUsagePercent() > 90) { currentPriority = WorkloadPriority.HIGH_ONLY; log.info("Switching to HIGH_ONLY workload mode"); } }

public void processWorkItem(WorkItem item) { if (currentPriority == WorkloadPriority.HIGH_ONLY && item.getPriority() != WorkItemPriority.HIGH) { // Queue low-priority work for later lowPriorityQueue.add(item); return; }

// Process high-priority work item.process(); } } ```

Prevention

  • Set heap to 2-4x working set size based on load testing
  • Monitor GC overhead metric continuously
  • Set up alerts for GC frequency and pause times
  • Conduct regular load tests with memory profiling
  • Use object pooling for high-allocation scenarios
  • Avoid finalizers - use AutoCloseable instead
  • Implement memory-aware circuit breakers
  • Document heap requirements for each service
  • **OutOfMemoryError: Java heap space**: Simple heap exhaustion
  • **OutOfMemoryError: Metaspace**: Class metadata exhausted
  • **OutOfMemoryError: Unable to create new native thread**: Thread creation failed
  • **OutOfMemoryError: Direct buffer memory**: NIO buffer pool exhausted