Introduction
Garbage collection pauses occur when the JVM stops application threads to reclaim memory. When these pauses exceed your application's latency SLA (e.g., p99 under 200ms), users experience timeouts and degraded performance. Long GC pauses are often caused by large heap sizes, inappropriate GC algorithm selection, or excessive object allocation rates.
This issue is common in high-throughput services, financial trading systems, and real-time applications with strict latency requirements.
Symptoms
- Application response times spike periodically, correlating with GC events
- Logs show "GC pause (G1 Evacuation Pause) 1234ms" exceeding SLA threshold
- p99 latency is much higher than p50, indicating tail latency from GC pauses
Common Causes
- Heap size too large for the selected GC algorithm, causing long collection cycles
- Using throughput-optimized GC (ParallelGC) when low latency is required
- High object allocation rate causing frequent Young GC promotions to Old Gen
- Memory leak causing Old Gen to fill up, triggering long Full GC cycles
Step-by-Step Fix
- 1.Switch to a low-latency GC algorithm: Use G1GC or ZGC for pause-time goals.
- 2.```bash
- 3.# G1GC with pause time target (Java 9+):
- 4.java -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xms4g -Xmx4g -jar app.jar
# ZGC for ultra-low pauses (Java 15+, sub-1ms pauses): java -XX:+UseZGC -Xms4g -Xmx4g -jar app.jar
# Shenandoah GC for low pauses (OpenJDK 12+): java -XX:+UseShenandoahGC -Xms4g -Xmx4g -jar app.jar ```
- 1.Right-size the heap: A smaller heap means shorter GC pauses.
- 2.```bash
- 3.# Instead of -Xmx16g, try -Xmx4g if your working set fits
- 4.# Monitor actual heap usage:
- 5.java -XX:+PrintGCDetails -Xlog:gc* -jar app.jar
# Look for actual heap usage in GC logs: # [GC pause (G1 Evacuation Pause), 0.045 secs] # [Eden: 512.0M(512.0M)->0.0B(512.0M) Survivors: 64.0M->64.0M Heap: 2.1G(4.0G)->1.6G(4.0G)] ```
- 1.Enable GC logging for analysis: Use JFR or GC logs to understand pause patterns.
- 2.```bash
- 3.# GC logging (Java 9+):
- 4.java -Xlog:gc*:file=gc.log:time,uptime,level,tags -jar app.jar
# Java Flight Recorder for detailed analysis: java -XX:StartFlightRecording=duration=60s,filename=recording.jfr -jar app.jar
# Analyze with: jfr print --events GCHeapSummary recording.jfr ```
- 1.Reduce object allocation rate: Reuse objects to decrease GC pressure.
- 2.```java
- 3.// BAD: creates a new StringBuilder every iteration
- 4.for (int i = 0; i < 1000000; i++) {
- 5.String result = "item-" + i + "-" + data; // 3 allocations per iteration
- 6.}
// GOOD: reuse StringBuilder StringBuilder sb = new StringBuilder(64); for (int i = 0; i < 1000000; i++) { sb.setLength(0); sb.append("item-").append(i).append('-').append(data); process(sb.toString()); } ```
Prevention
- Use G1GC or ZGC for latency-sensitive applications
- Set -XX:MaxGCPauseMillis to define your pause time target
- Monitor GC metrics in production with Prometheus JMX exporter
- Profile allocation hotspots with async-profiler or JFR