Fix Java GC Pause Exceeding SLA G1 Garbage Collector Tuning Latency

Introduction

The G1 (Garbage-First) collector aims to meet pause time targets by dividing the heap into regions and collecting the ones with the most garbage first. When the heap is undersized, allocation rate is high, or large objects fill regions quickly, G1 must perform full GCs or mixed collections that exceed the target pause time. This causes request latency spikes, timeout errors, and SLA violations.

Symptoms

P99 latency spikes correlating with GC pauses
GC pause (G1 Evacuation Pause) (mixed) 450ms exceeding target
Full GC events: Pause Full (Allocation Failure) 1200ms
Application timeouts during GC pauses
G1 humongous allocation filling regions with large objects

``` [2024-01-15T10:30:00.123+0000] GC(45) Pause Young (Normal) (G1 Evacuation Pause) [2024-01-15T10:30:00.123+0000] GC(45) Pause Young (Normal) 1024M->856M(2048M) 380.5ms # Target was 200ms, actual was 380ms - SLA violation!

[2024-01-15T10:30:15.456+0000] GC(46) Pause Full (Allocation Failure) [2024-01-15T10:30:15.456+0000] GC(46) Pause Full 1800M->1200M(2048M) 1450.2ms # Full GC: stop-the-world for 1.45 seconds! ```

Common Causes

MaxGCPauseMillis target too aggressive for the heap size
Heap too small causing frequent collections
Humongous objects (> half region size) fragmenting the heap
Promotion failure causing full GC
Allocation rate exceeding GC throughput capacity

Step-by-Step Fix

1.Analyze GC logs:
2.```bash
3.# Enable detailed GC logging
4.java -Xlog:gc*:file=gc.log:time,uptime,level,tags \
5.-jar app.jar

# Analyze with GCViewer or GCEasy # Look for: # - Pause times exceeding MaxGCPauseMillis # - Full GC frequency # - Heap occupancy before/after GC

# Quick analysis with grep grep "Pause" gc.log | awk '{print $NF}' | sort -n | tail -10 ```

1.Tune G1 collector settings:
2.```bash
3.# Set realistic pause target (200-500ms for most apps)
4.java -XX:+UseG1GC \
5.-XX:MaxGCPauseMillis=300 \
6.-Xms4g -Xmx4g \ # Same min and max
7.-XX:G1HeapRegionSize=16m \ # Default is auto-detected
8.-XX:InitiatingHeapOccupancyPercent=45 \ # Start mixed GC earlier
9.-jar app.jar
10.`
11.Reduce humongous allocations:
12.```bash
13.# G1 region size should be larger than your biggest objects
14.# Default regions: heap_size / 2048 (min 1MB, max 32MB)
15.# Humongous threshold: region_size / 2

# If you have many 10MB objects, set region size to 32MB java -XX:+UseG1GC \ -XX:G1HeapRegionSize=32m \ -Xmx8g \ -jar app.jar

# Now objects up to 16MB are not humongous ```

1.Increase heap to reduce GC frequency:
2.```bash
3.# Larger heap = fewer collections = fewer pause opportunities
4.# But: larger heap = longer individual pauses
5.# Find the sweet spot with load testing

java -Xms8g -Xmx8g \ # Double the heap -XX:+UseG1GC \ -XX:MaxGCPauseMillis=300 \ -XX:+AlwaysPreTouch \ # Touch all pages at startup -jar app.jar ```

1.Consider ZGC for sub-millisecond pauses:
2.```bash
3.# Java 15+ ZGC - pauses under 1ms regardless of heap size
4.java -XX:+UseZGC \
5.-Xmx16g \
6.-XX:+ZGenerational \ # Java 21+ generational ZGC
7.-jar app.jar

# For latency-critical applications with large heaps ```

Prevention

Monitor GC metrics with JMX: java.lang:type=GarbageCollector
Set up Grafana dashboards tracking GC pause times and frequency
Alert when P99 pause time exceeds 80% of MaxGCPauseMillis
Load test with production-like data volumes to size heap correctly
Use -XX:+PrintGCDetails and analyze logs after every deployment
Consider -XX:+UseStringDeduplication for string-heavy applications
In Kubernetes, set resource requests/limits accounting for heap: memory = Xmx + Metaspace + 25% overhead

Java GC Pause Exceeding SLA G1 Garbage Collector Tuning

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Share this guide

More Java Troubleshooting Guides

Java WebSocket Client Connection Failed

Java HttpClient Connection Reset

Java HttpURLConnection Timeout

Java SSL Certificate Exception

Java SSL Handshake Exception

Fix Java Heap Space Error