Introduction
Java GC Overhead Limit Exceeded (java.lang.OutOfMemoryError: GC overhead limit exceeded) occurs when the JVM spends more than 98% of its time performing garbage collection but reclaims less than 2% of the heap. This error is a protective mechanism - the JVM detects that continued execution would be futile since all CPU time is consumed by GC with no meaningful work being done. Unlike simple heap exhaustion, this error indicates a fundamental mismatch between memory allocation patterns and available heap, often caused by memory leaks, aggressive object creation, or severely undersized heap.
Symptoms
- Application throws
OutOfMemoryError: GC overhead limit exceeded - Application becomes extremely slow before crashing (GC thrashing)
- CPU usage near 100% but throughput drops to near zero
- GC logs show Full GC running every few seconds
- Heap usage returns to near-maximum immediately after GC completes
- Response times increase 10-100x before crash
- Issue appears gradually as workload increases over time
Common Causes
- Memory leak causing heap to fill faster than GC can reclaim
- Heap size too small for working set
- Creating大量 short-lived objects (allocation storm)
- Large object graph preventing efficient GC
- Finalizer queue backlog (objects with finalizers)
- Weak/Soft references not being cleared fast enough
- JNI references preventing object collection
Step-by-Step Fix
### 1. Confirm GC overhead diagnosis
Distinguish from other OOM errors:
```bash # Check error message in logs grep -E "GC overhead|OutOfMemoryError" /var/log/app/*.log
# Expected output: # java.lang.OutOfMemoryError: GC overhead limit exceeded
# This is different from: # - Java heap space (simple heap exhaustion) # - Metaspace (class metadata exhausted) # - Unable to create new native thread (OS thread limit) # - Requested array size exceeds VM limit (corrupt heap)
# Verify with jstat before crash jstat -gcutil <pid> 1000
# Watch for GC overhead pattern: # YGC YGCT FGC FGCT GCT # 1000 12.5 500 87.5 100.0 (FGCT/GCT > 98%)
# Calculate GC overhead percentage # FGCT / GCT * 100 = percentage of time in Full GC # If > 98%, GC overhead limit will trigger ```
### 2. Analyze GC logs for root cause
Extract GC patterns from logs:
```bash # Java 8 GC log example # Parse GC frequency and efficiency
grep "Full GC" gc.log | awk '{ # Extract heap before and after GC match($0, /([0-9]+)K->([0-9]+)K/, arr); before = arr[1]; after = arr[2]; freed = before - after; percent = (freed / before) * 100;
print $1, $2, "Before:", before, "After:", after, "Freed:", freed, "Efficiency:", percent "%"; }'
# Healthy GC pattern: # 10:00:00 Full GC 2048000K->512000K Freed: 1536000K Efficiency: 75% # 10:05:00 Full GC 2100000K->520000K Freed: 1580000K Efficiency: 75%
# GC overhead pattern: # 10:00:00 Full GC 3900000K->3800000K Freed: 100000K Efficiency: 2.5% # 10:00:05 Full GC 3950000K->3850000K Freed: 100000K Efficiency: 2.5% # 10:00:10 Full GC 3980000K->3880000K Freed: 100000K Efficiency: 2.5% # Notice: Very little memory freed, GC running constantly
# Check GC pause times grep "Full GC" gc.log | awk '{ match($0, /([0-9.]+)ms/, arr); print "Pause:", arr[1], "ms"; }' | sort -n | tail -20
# Long GC pauses (>10 seconds) indicate heap issues ```
Use GC analysis tools:
```bash # gceasy.io - Upload gc.log for detailed analysis # https://gceasy.io/
# Key metrics to check: # - GC Frequency: How often GC runs # - GC Efficiency: Memory freed per GC # - GC Pause Time: Application downtime # - Stop The World time: Total application pause
# Or use local tools # Install gclog parser pip install gclog-parser
# Analyze with Python python3 -c " import gclogparser with open('gc.log') as f: events = gclogparser.parse(f) for event in events: if event.type == 'Full GC': print(f'{event.timestamp}: {event.heap_before} -> {event.heap_after}') " ```
### 3. Identify memory leak with heap dump
Capture heap before crash:
```bash # Auto-generate heap dump on OOM -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/app/heapdump.hprof -XX:OnOutOfMemoryError="jcmd %p GC.heap_dump /var/log/app/forced.hprof"
# Or manually trigger when you see GC thrashing # Warning: This may cause application to hang temporarily jcmd <pid> GC.heap_dump /tmp/heap.hprof
# For large heaps (>8GB), use compressed format jcmd <pid> GC.heap_dump -gz /tmp/heap.hprof.gz
# Kubernetes pod kubectl exec <pod-name> -- jcmd 1 GC.heap_dump /tmp/heap.hprof kubectl cp <pod-name>:/tmp/heap.hprof ./heap.hprof ```
Analyze heap dump for leak patterns:
```bash # Open in Eclipse MAT and run queries
# 1. Find memory leak suspects # Right-click heap dump > Leak Suspects Report
# 2. Find largest objects SELECT * FROM java.lang.Object[] ORDER BY used_heap DESC LIMIT 50
# 3. Find objects with many instances SELECT toString(class), COUNT(*) as count FROM java.lang.Object GROUP BY toString(class) ORDER BY count DESC LIMIT 50
# 4. Find GC roots retaining large objects # Right-click object > Path to GC Roots > Exclude all phantom/weak references
# 5. Find duplicate strings (common leak) SELECT toString(s) AS value, COUNT(*) AS count FROM java.lang.String s GROUP BY toString(s) HAVING COUNT(*) > 100 ORDER BY count DESC ```
Common leak patterns in heap dump:
``` Leak Pattern 1: Growing ArrayList/HashMap java.util.ArrayList (2.1 GB / 45% of heap) - internal array: byte[] (2.0 GB) - Retained by: com.example.CacheManager.cache - 50,000,000 items, never cleared
Leak Pattern 2: Unclosed resources java.io.FileInputStream (500,000 instances) - Each holding file descriptor - Retained by: ThreadLocal in RequestProcessor
Leak Pattern 3: Event listeners java.util.ArrayList (1.5 GB) - Contains: com.example.EventListener[] - Listeners registered but never unregistered - Each listener holds reference to entire UI component tree ```
### 4. Tune GC algorithm for workload
Select appropriate GC algorithm:
```bash # G1 GC (Recommended for most applications) # Best for: Large heaps (>4GB), predictable pause times
-XX:+UseG1GC -XX:MaxGCPauseMillis=200 # Target max pause time -XX:G1HeapRegionSize=16m # Region size (1-32MB) -XX:G1ReservePercent=10 # Reserve 10% for evacuation -XX:G1NewSizePercent=30 # Young gen as % of heap -XX:G1MaxNewSizePercent=60 # Max young gen size -XX:ParallelGCThreads=8 # Parallel GC threads -XX:ConcGCThreads=2 # Concurrent GC threads -XX:InitiatingHeapOccupancyPercent=45 # Start mixed GC at 45%
# For GC overhead specifically: -XX:G1HeapWastePercent=10 # Allow 10% wasted space before mixed GC -XX:G1MixedGCCountTarget=8 # Number of mixed GCs in cycle -XX:G1MixedGCLIveThresholdPercent=85 # Include regions with <85% live data
# Parallel GC (Throughput-focused, older JVMs) # Best for: Batch processing, scientific computing
-XX:+UseParallelGC -XX:ParallelGCThreads=8 -XX:MaxGCPauseMillis=100 -XX:GCTimeRatio=99 # Target 99% throughput (1% GC time) -XX:AdaptiveSizePolicyWeight=90
# If GCTimeRatio too high, may cause GC overhead # Reduce to 90 if experiencing GC overhead
# ZGC (Java 15+, low latency) # Best for: Low-latency applications, large heaps
-XX:+UseZGC -XX:ZCollectionInterval=5 # Minimum time between GCs (seconds) -XX:ZAllocationSpikeTolerance=2.0 # Handle allocation spikes -XX:ConcGCThreads=4 # Concurrent threads -XX:MaxGCPauseMillis=10 # Target pause <10ms
# For GC overhead with ZGC: # ZGC rarely hits GC overhead due to concurrent operation # If it does, heap is severely undersized ```
### 5. Adjust heap configuration
Size heap appropriately:
```bash # Rule of thumb: Heap should be 2-4x working set size
# Calculate working set from GC logs # Average heap usage after Full GC = working set
# If working set is 2GB: # Minimum heap: 4GB (2x working set) # Recommended heap: 6-8GB (3-4x working set)
# Production configuration -Xms6g # Initial heap = Max heap (avoid resizing) -Xmx6g # Max heap
# For containerized deployments # Kubernetes with 8GB limit: -XX:InitialRAMPercentage=75.0 # 6GB initial -XX:MaxRAMPercentage=75.0 # 6GB max -XX:+UseContainerSupport # Respect container limits (Java 8u191+)
# Don't set heap too close to container limit # Leave room for: Metaspace, Code Cache, Thread stacks, Direct buffers
# Container memory = Heap + Non-heap # Non-heap typically 500MB-1GB
# For 8GB container: # Heap: 6GB (75%) # Non-heap: ~1GB # Headroom: ~1GB ```
Disable GC overhead limit (not recommended):
```bash # ONLY for debugging, NOT for production! # This allows JVM to continue running despite GC overhead # Application will be extremely slow but won't crash
-XX:-UseGCOverheadLimit
# Use this to: # - Capture better diagnostics before eventual crash # - Allow application to drain requests gracefully
# NEVER use this as a "fix" - it masks the underlying problem ```
### 6. Fix memory allocation patterns
Reduce object allocation rate:
```java // WRONG: Creating millions of temporary objects public List<String> processData(List<String> input) { List<String> result = new ArrayList<>();
for (String item : input) { // Creates new StringBuilder for each iteration String processed = new StringBuilder(item) .append("-processed") .toString(); result.add(processed); }
return result; // All intermediate objects go to GC }
// CORRECT: Reuse StringBuilder public List<String> processData(List<String> input) { List<String> result = new ArrayList<>(input.size()); StringBuilder sb = new StringBuilder(64); // Reuse same builder
for (String item : input) { sb.setLength(0); // Clear builder sb.append(item).append("-processed"); result.add(sb.toString()); }
return result; }
// WRONG: Boxing/unboxing in loops public long sumList(List<Integer> numbers) { long sum = 0; for (Integer n : numbers) { // Auto-unboxing creates objects sum += n; // Each iteration boxes/unboxes } return sum; }
// CORRECT: Use primitive arrays public long sumArray(long[] numbers) { long sum = 0; for (long n : numbers) { sum += n; // No boxing } return sum; }
// WRONG: String concatenation in loop public String buildMessage(List<String> parts) { String message = ""; for (String part : parts) { message += part + ", "; // Creates new String each iteration } return message; }
// CORRECT: Use StringBuilder public String buildMessage(List<String> parts) { StringBuilder sb = new StringBuilder(parts.size() * 20); for (String part : parts) { sb.append(part).append(", "); } return sb.toString(); } ```
Use object pooling for high-allocation scenarios:
```java // For objects created/destroyed frequently // Use Apache Commons Pool
public class ConnectionPool { private final GenericObjectPool<Connection> pool;
public ConnectionPool() { GenericObjectPoolConfig<Connection> config = new GenericObjectPoolConfig<>(); config.setMaxTotal(50); config.setMaxIdle(25); config.setMinIdle(5); config.setBlockWhenExhausted(true);
pool = new GenericObjectPool<>(new ConnectionFactory(), config); }
public Connection borrow() throws Exception { return pool.borrowObject(); }
public void returnConnection(Connection conn) { pool.returnObject(conn); } }
// For byte arrays, use Netty's Recycler public class BufferPool { private static final Recycler<byte[]> RECYCLER = new Recycler<byte[]>() { @Override protected byte[] newObject(Handle<byte[]> handle) { return new byte[8192]; } };
public static byte[] acquire() { return RECYCLER.get(); }
public static void release(byte[] buffer, Handle<?> handle) { handle.recycle(buffer); } } ```
### 7. Clear finalizer backlog
Objects with finalizers can block GC:
```bash # Check finalizer queue size jcmd <pid> GC.finalizer_info
# Or with JMX jconsole <pid> > Memory > Finalizer Queue Size
# If queue size is growing, finalizers can't keep up ```
Fix finalizer issues:
```java // WRONG: Relying on finalizers for cleanup public class LeakyResource { private final FileInputStream stream;
public LeakyResource(String path) throws FileNotFoundException { this.stream = new FileInputStream(path); // No explicit close - relies on finalizer }
@Override protected void finalize() { // Finalizer may not run for hours // Objects accumulate in finalizer queue stream.close(); } }
// CORRECT: Use try-with-resources public class SafeResource implements AutoCloseable { private final FileInputStream stream;
public SafeResource(String path) throws FileNotFoundException { this.stream = new FileInputStream(path); }
@Override public void close() throws IOException { stream.close(); } }
// Usage try (SafeResource resource = new SafeResource("file.txt")) { // Use resource } // Automatically closed
// Avoid PhantomReference for cleanup // Use Cleaner instead (Java 9+)
public class SafeCleanup implements Runnable { private final Cleaner.Cleanable cleanable;
public SafeCleanup() { Cleaner cleaner = Cleaner.create(); cleanable = cleaner.register(this, () -> { // Cleanup code - runs when object is GC'd releaseNativeResource(); }); }
public void close() { cleanable.clean(); // Explicit cleanup }
private void releaseNativeResource() { // Release native resources } } ```
### 8. Implement circuit breaker for memory pressure
Prevent memory exhaustion from cascading:
```java @Component public class MemoryCircuitBreaker {
private final MemoryMXBean memoryMXBean; private volatile boolean circuitOpen = false; private volatile long lastCheckTime = 0;
public MemoryCircuitBreaker() { this.memoryMXBean = ManagementFactory.getMemoryMXBean(); }
public boolean canAcceptRequest() { // Check every 5 seconds long now = System.currentTimeMillis(); if (now - lastCheckTime < 5000) { return !circuitOpen; }
lastCheckTime = now; MemoryUsage usage = memoryMXBean.getHeapMemoryUsage(); double usageRatio = (double) usage.getUsed() / usage.getMax();
if (usageRatio > 0.90) { circuitOpen = true; log.warn("Memory circuit OPEN - heap at {:.1f}%", usageRatio * 100); return false; } else if (usageRatio < 0.70) { circuitOpen = false; log.info("Memory circuit CLOSED - heap at {:.1f}%", usageRatio * 100); return true; }
return !circuitOpen; }
@EventListener public void handleRequest(RequestEvent event) { if (!canAcceptRequest()) { event.reject("Service temporarily unavailable - memory pressure"); return; }
// Process request } }
// Spring Cloud Circuit Breaker integration @Configuration public class MemoryCircuitBreakerConfig {
@Bean public Customizer<Resilience4JCircuitBreakerFactory> memoryCircuitBreaker() { return factory -> factory.configureDefault(id -> id .with(() -> CircuitBreakerConfig.custom() .failureRateThreshold(50) .slidingWindowSize(10) .minimumNumberOfCalls(5) .recordExceptions(OutOfMemoryError.class, GCOverheadLimitExceeded.class) .waitDurationInOpenState(Duration.ofMinutes(5)) .build())); } } ```
### 9. Monitor for early detection
Set up proactive monitoring:
```java // Scheduled memory health check @Component public class MemoryHealthMonitor {
private final MemoryMXBean memoryMXBean; private final ApplicationEventPublisher publisher;
@Scheduled(fixedRate = 30000) // Every 30 seconds public void checkMemoryHealth() { MemoryUsage usage = memoryMXBean.getHeapMemoryUsage(); double usedPercent = (double) usage.getUsed() / usage.getMax() * 100;
if (usedPercent > 85) { log.warn("Heap usage above 85%: {}MB / {}MB", usage.getUsed() / 1024 / 1024, usage.getMax() / 1024 / 1024); publisher.publishEvent(new MemoryWarningEvent(this, usedPercent)); }
if (usedPercent > 95) { log.error("Heap usage CRITICAL: {}MB / {}MB", usage.getUsed() / 1024 / 1024, usage.getMax() / 1024 / 1024); publisher.publishEvent(new MemoryCriticalEvent(this, usedPercent)); } } } ```
Prometheus alerting rules:
```yaml groups: - name: java_gc rules: - alert: JavaGCHighFrequency expr: rate(jvm_gc_collection_seconds_count[5m]) > 1 for: 5m labels: severity: warning annotations: summary: "Java GC running more than once per minute" description: "GC frequency is {{ $value | humanize }} per second"
- alert: JavaGCHighOverhead
- expr: rate(jvm_gc_collection_seconds_sum[5m]) / 300 > 0.5
- for: 5m
- labels:
- severity: critical
- annotations:
- summary: "Java GC consuming > 50% of CPU time"
- description: "GC overhead is {{ $value | humanizePercentage }}"
- alert: JavaHeapHigh
- expr: jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} > 0.85
- for: 10m
- labels:
- severity: warning
- annotations:
- summary: "Java heap usage above 85%"
- alert: JavaHeapNotReclaiming
- expr: |
- delta(jvm_memory_used_bytes{area="heap"}[10m]) < 0
- and
- rate(jvm_gc_collection_seconds_sum[10m]) > 60
- for: 10m
- labels:
- severity: critical
- annotations:
- summary: "Java heap not being reclaimed despite GC"
- description: "Possible memory leak - GC running but heap not decreasing"
`
### 10. Implement graceful degradation
Handle memory pressure without crashing:
```java // Graceful shutdown on memory pressure @Component public class GracefulMemoryShutdown {
private final MemoryMXBean memoryMXBean; private final ApplicationEventPublisher publisher; private volatile boolean shutdownInitiated = false;
public GracefulMemoryShutdown() { this.memoryMXBean = ManagementFactory.getMemoryMXBean(); }
@Scheduled(fixedRate = 10000) public void monitorMemory() { if (shutdownInitiated) return;
MemoryUsage usage = memoryMXBean.getHeapMemoryUsage(); double usageRatio = (double) usage.getUsed() / usage.getMax();
if (usageRatio > 0.95) { shutdownInitiated = true; log.error("Critical memory pressure ({}%) - initiating graceful shutdown", usageRatio * 100);
// Publish event for graceful shutdown publisher.publishEvent(new ContextClosedEvent(this));
// Or trigger shutdown directly new Thread(() -> { try { // Give time for in-flight requests to complete Thread.sleep(30000); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } System.exit(1); }, "memory-shutdown").start(); } } }
// Drop low-priority work under memory pressure @Component public class AdaptiveWorkloadManager {
private volatile WorkloadPriority currentPriority = WorkloadPriority.ALL;
@EventListener public void handleMemoryWarning(MemoryWarningEvent event) { if (event.getUsagePercent() > 90) { currentPriority = WorkloadPriority.HIGH_ONLY; log.info("Switching to HIGH_ONLY workload mode"); } }
public void processWorkItem(WorkItem item) { if (currentPriority == WorkloadPriority.HIGH_ONLY && item.getPriority() != WorkItemPriority.HIGH) { // Queue low-priority work for later lowPriorityQueue.add(item); return; }
// Process high-priority work item.process(); } } ```
Prevention
- Set heap to 2-4x working set size based on load testing
- Monitor GC overhead metric continuously
- Set up alerts for GC frequency and pause times
- Conduct regular load tests with memory profiling
- Use object pooling for high-allocation scenarios
- Avoid finalizers - use AutoCloseable instead
- Implement memory-aware circuit breakers
- Document heap requirements for each service
Related Errors
- **OutOfMemoryError: Java heap space**: Simple heap exhaustion
- **OutOfMemoryError: Metaspace**: Class metadata exhausted
- **OutOfMemoryError: Unable to create new native thread**: Thread creation failed
- **OutOfMemoryError: Direct buffer memory**: NIO buffer pool exhausted