Introduction
The G1 (Garbage-First) collector aims to meet pause time targets by dividing the heap into regions and collecting the ones with the most garbage first. When the heap is undersized, allocation rate is high, or large objects fill regions quickly, G1 must perform full GCs or mixed collections that exceed the target pause time. This causes request latency spikes, timeout errors, and SLA violations.
Symptoms
- P99 latency spikes correlating with GC pauses
GC pause (G1 Evacuation Pause) (mixed) 450msexceeding target- Full GC events:
Pause Full (Allocation Failure) 1200ms - Application timeouts during GC pauses
G1 humongous allocationfilling regions with large objects
``` [2024-01-15T10:30:00.123+0000] GC(45) Pause Young (Normal) (G1 Evacuation Pause) [2024-01-15T10:30:00.123+0000] GC(45) Pause Young (Normal) 1024M->856M(2048M) 380.5ms # Target was 200ms, actual was 380ms - SLA violation!
[2024-01-15T10:30:15.456+0000] GC(46) Pause Full (Allocation Failure) [2024-01-15T10:30:15.456+0000] GC(46) Pause Full 1800M->1200M(2048M) 1450.2ms # Full GC: stop-the-world for 1.45 seconds! ```
Common Causes
MaxGCPauseMillistarget too aggressive for the heap size- Heap too small causing frequent collections
- Humongous objects (> half region size) fragmenting the heap
- Promotion failure causing full GC
- Allocation rate exceeding GC throughput capacity
Step-by-Step Fix
- 1.Analyze GC logs:
- 2.```bash
- 3.# Enable detailed GC logging
- 4.java -Xlog:gc*:file=gc.log:time,uptime,level,tags \
- 5.-jar app.jar
# Analyze with GCViewer or GCEasy # Look for: # - Pause times exceeding MaxGCPauseMillis # - Full GC frequency # - Heap occupancy before/after GC
# Quick analysis with grep grep "Pause" gc.log | awk '{print $NF}' | sort -n | tail -10 ```
- 1.Tune G1 collector settings:
- 2.```bash
- 3.# Set realistic pause target (200-500ms for most apps)
- 4.java -XX:+UseG1GC \
- 5.-XX:MaxGCPauseMillis=300 \
- 6.-Xms4g -Xmx4g \ # Same min and max
- 7.-XX:G1HeapRegionSize=16m \ # Default is auto-detected
- 8.-XX:InitiatingHeapOccupancyPercent=45 \ # Start mixed GC earlier
- 9.-jar app.jar
- 10.
` - 11.Reduce humongous allocations:
- 12.```bash
- 13.# G1 region size should be larger than your biggest objects
- 14.# Default regions: heap_size / 2048 (min 1MB, max 32MB)
- 15.# Humongous threshold: region_size / 2
# If you have many 10MB objects, set region size to 32MB java -XX:+UseG1GC \ -XX:G1HeapRegionSize=32m \ -Xmx8g \ -jar app.jar
# Now objects up to 16MB are not humongous ```
- 1.Increase heap to reduce GC frequency:
- 2.```bash
- 3.# Larger heap = fewer collections = fewer pause opportunities
- 4.# But: larger heap = longer individual pauses
- 5.# Find the sweet spot with load testing
java -Xms8g -Xmx8g \ # Double the heap -XX:+UseG1GC \ -XX:MaxGCPauseMillis=300 \ -XX:+AlwaysPreTouch \ # Touch all pages at startup -jar app.jar ```
- 1.Consider ZGC for sub-millisecond pauses:
- 2.```bash
- 3.# Java 15+ ZGC - pauses under 1ms regardless of heap size
- 4.java -XX:+UseZGC \
- 5.-Xmx16g \
- 6.-XX:+ZGenerational \ # Java 21+ generational ZGC
- 7.-jar app.jar
# For latency-critical applications with large heaps ```
Prevention
- Monitor GC metrics with JMX:
java.lang:type=GarbageCollector - Set up Grafana dashboards tracking GC pause times and frequency
- Alert when P99 pause time exceeds 80% of MaxGCPauseMillis
- Load test with production-like data volumes to size heap correctly
- Use
-XX:+PrintGCDetailsand analyze logs after every deployment - Consider
-XX:+UseStringDeduplicationfor string-heavy applications - In Kubernetes, set resource requests/limits accounting for heap: memory = Xmx + Metaspace + 25% overhead