Introduction
Go memory leaks and goroutine leaks occur when memory or goroutines are no longer referenced by the program but cannot be garbage collected. Unlike C/C++, Go has automatic garbage collection, so "leaks" typically mean objects are still reachable (preventing GC) or goroutines are blocked forever. Symptoms include steadily increasing RSS memory, growing goroutine count, eventual OOM kills, and performance degradation. Leaks often stem from unclosed channels, circular data structures, unbounded caches, or goroutines waiting on channels that never send.
Symptoms
- RSS (Resident Set Size) memory grows continuously over hours/days
- Goroutine count increases without bound (check
runtime.NumGoroutine()) - GC pause times increase as heap grows
- Container killed with OOMKilled exit code 137
- Application becomes unresponsive as memory pressure increases
- Issue appears after traffic increase, new feature deploy, or configuration change
Common Causes
- Goroutine blocked on channel send/receive with no counterpart
- Goroutine in infinite loop without exit condition
- Circular references preventing garbage collection
- Unbounded cache or map growing without eviction
- Slice created from large array preventing GC of underlying array
- C memory allocated via cgo not being freed
- Timer not stopped or ticker not ticked
Step-by-Step Fix
### 1. Monitor goroutine count and memory
Add runtime metrics to detect leaks early:
```go // Add monitoring endpoint import ( "runtime" "net/http" "encoding/json" )
type MemStats struct {
Goroutines int json:"goroutines"
HeapAlloc uint64 json:"heap_alloc"
HeapSys uint64 json:"heap_sys"
NumGC uint32 json:"num_gc"
PauseTotal uint64 json:"pause_total_ns"
}
func memHandler(w http.ResponseWriter, r *http.Request) { var m runtime.MemStats runtime.ReadMemStats(&m)
stats := MemStats{ Goroutines: runtime.NumGoroutine(), HeapAlloc: m.HeapAlloc, HeapSys: m.HeapSys, NumGC: m.NumGC, PauseTotal: m.PauseTotalNs, }
w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(stats) } ```
Prometheus metrics:
```go import "github.com/prometheus/client_golang/prometheus"
var ( goroutineCount = prometheus.NewGauge(prometheus.GaugeOpts{ Name: "app_goroutines_count", Help: "Number of goroutines", }) heapAlloc = prometheus.NewGauge(prometheus.GaugeOpts{ Name: "app_heap_alloc_bytes", Help: "Heap memory allocated", }) )
func init() { prometheus.MustRegister(goroutineCount, heapAlloc) }
func collectMetrics() { ticker := time.NewTicker(10 * time.Second) go func() { for range ticker.C { goroutineCount.Set(float64(runtime.NumGoroutine())) var m runtime.MemStats runtime.ReadMemStats(&m) heapAlloc.Set(float64(m.HeapAlloc)) } }() } ```
### 2. Capture heap profile with pprof
Enable pprof and capture heap snapshot:
```go // Import pprof in main package import ( _ "net/http/pprof" "net/http" )
func main() { // pprof endpoints available at /debug/pprof/ go http.ListenAndServe("localhost:6060", nil)
// ... rest of application } ```
Capture heap profile:
```bash # Take heap snapshot go tool pprof http://localhost:6060/debug/pprof/heap
# Or save to file curl -o heap.pprof http://localhost:6060/debug/pprof/heap
# Analyze interactively go tool pprof heap.pprof
# pprof commands: (pprof) top10 # Show top 10 memory consumers (pprof) list FuncName # Show code for specific function (pprof) web # Generate call graph (requires graphviz) (pprof) alloc_space # Show allocations (not just in-use) ```
Key pprof views:
```bash # Show in-use memory (what's leaking) go tool pprof -inuse_space http://localhost:6060/debug/pprof/heap
# Show allocation rate (what's being allocated) go tool pprof -alloc_space http://localhost:6060/debug/pprof/heap
# Compare before/after snapshots go tool pprof -base before.pprof after.pprof ```
### 3. Capture goroutine profile
Identify where goroutines are blocked:
```bash # Capture goroutine profile curl -o goroutine.pprof http://localhost:6060/debug/pprof/goroutine
# Analyze go tool pprof goroutine.pprof
# See blocking stack traces (pprof) list myFunction (pprof) list goroutine ```
Programmatic goroutine dump:
```go import ( "runtime" "runtime/pprof" "os" )
func dumpGoroutines() { f, _ := os.Create("/tmp/goroutines.out") defer f.Close()
pprof.Lookup("goroutine").WriteTo(f, 2) }
// Or get stack trace as string func getGoroutineStack() string { buf := make([]byte, 1<<20) return string(buf[:runtime.Stack(buf, true)]) } ```
### 4. Find goroutine leaks with goleak
Use uber-go/goleak to detect goroutine leaks in tests:
```go import ( "testing" "go.uber.org/goleak" )
func TestMain(m *testing.M) { // Verify no goroutine leaks after each test goleak.VerifyTestMain(m) }
func TestMyFunction(t *testing.T) { // Test code // goleak will verify goroutines cleaned up }
// Or verify in specific test func TestSpecific(t *testing.T) { defer goleak.VerifyNone(t)
// Code that should not leak go func() { // ... }() } ```
### 5. Fix channel-based goroutine leaks
Goroutines blocking on channels are the most common leak:
```go // WRONG: Goroutine leaks - channel never closed func wrong() { ch := make(chan int) go func() { for v := range ch { // Blocks forever if ch never closed process(v) } }()
// Send some values but forget to close ch <- 1 ch <- 2 // Channel never closed, goroutine stuck }
// CORRECT: Close channel when done func correct() { ch := make(chan int) go func() { defer func() { if r := recover(); r != nil { // Handle panic } }() for v := range ch { process(v) } }()
ch <- 1 ch <- 2 close(ch) // Signal no more values }
// CORRECT: Use context for cancellation func withContext(ctx context.Context) { ch := make(chan int) go func() { for { select { case v := <-ch: process(v) case <-ctx.Done(): return // Exit on cancellation } } }()
// Later, cancel context to stop goroutine // cancel() } ```
### 6. Fix unbounded cache/map growth
Implement cache with eviction policy:
```go // WRONG: Unbounded map growth var cache = make(map[string][]byte)
func wrong(key string, value []byte) { cache[key] = value // Grows forever }
// CORRECT: Use LRU cache with size limit import "github.com/hashicorp/golang-lru/v2"
var cache *lru.Cache[string, []byte]
func init() { cache, _ = lru.New[string, []byte](1000) // Max 1000 items }
func correct(key string, value []byte) { cache.Add(key, value) // Evicts oldest if full }
// CORRECT: Use sync.Map with periodic cleanup type Cache struct { mu sync.RWMutex data map[string]item }
type item struct { value interface{} expiration time.Time }
func (c *Cache) Get(key string) (interface{}, bool) { c.mu.RLock() defer c.mu.RUnlock()
it, ok := c.data[key] if !ok || time.Now().After(it.expiration) { return nil, false } return it.value, true }
func (c *Cache) cleanup() { ticker := time.NewTicker(1 * time.Minute) for range ticker.C { c.mu.Lock() now := time.Now() for k, v := range c.data { if now.After(v.expiration) { delete(c.data, k) } } c.mu.Unlock() } } ```
### 7. Fix slice memory retention
Slices can retain reference to large underlying arrays:
```go // WRONG: Small slice retains large array func wrong() []byte { large := make([]byte, 100<<20) // 100MB // Fill large with data
small := large[:100] // Only need 100 bytes return small // But entire 100MB can't be GC'd }
// CORRECT: Copy to new slice func correct() []byte { large := make([]byte, 100<<20)
small := make([]byte, 100) copy(small, large[:100]) return small // Only 100 bytes retained }
// CORRECT: Use append to force copy func correct2() []byte { large := make([]byte, 100<<20) return append([]byte(nil), large[:100]...) } ```
### 8. Fix timer and ticker leaks
Timers and tickers must be stopped:
```go // WRONG: Ticker never stopped func wrong() { ticker := time.NewTicker(time.Second) go func() { for range ticker.C { process() } }() // Ticker continues ticking even after function returns }
// CORRECT: Stop ticker when done func correct(ctx context.Context) { ticker := time.NewTicker(time.Second) defer ticker.Stop()
go func() { for { select { case <-ticker.C: process() case <-ctx.Done(): return } } }() }
// CORRECT: Timer with proper cleanup func withTimer(timeout time.Duration) error { timer := time.NewTimer(timeout) defer timer.Stop()
select { case result := <-doWork(): return result case <-timer.C: return errors.New("timeout") } } ```
### 9. Use trace tool for runtime analysis
Go's trace tool shows goroutine scheduling:
```bash # Capture trace curl -o trace.out http://localhost:6060/debug/pprof/trace?seconds=30
# Or programmatically import "runtime/trace"
func captureTrace() { f, _ := os.Create("/tmp/trace.out") defer f.Close()
trace.Start(f) defer trace.Stop()
// ... code to trace }
# Analyze trace go tool trace trace.out
# Trace viewer shows: # - Goroutine lifecycle # - GC events # - Syscalls # - Scheduler activity ```
### 10. Fix circular reference leaks
Circular references can delay GC:
```go // WRONG: Circular reference type Node struct { Next *Node Prev *Node Data []byte // Large payload }
func createLeak() { a := &Node{Data: make([]byte, 1<<20)} b := &Node{Data: make([]byte, 1<<20)} a.Next = b b.Prev = a // Even if a and b go out of scope, they reference each other
// If there's external reference, entire chain is retained }
// CORRECT: Use weak references or break cycles type Node struct { Next *Node Data []byte }
func correct() { a := &Node{Data: make([]byte, 1<<20)} b := &Node{Data: make([]byte, 1<<20)} a.Next = b // No back-reference
// When a is GC'd, b becomes unreachable and is also GC'd }
// Or use runtime.SetFinalizer for cleanup import "runtime"
type Resource struct { handle uintptr }
func newResource() *Resource { r := &Resource{handle: allocate()} runtime.SetFinalizer(r, func(r *Resource) { free(r.handle) // Cleanup on GC }) return r } ```
Prevention
- Always close channels when done sending
- Use context.Context for goroutine cancellation
- Implement cache eviction (LRU, TTL)
- Stop timers and tickers with defer
- Use
goleakin test suite - Monitor goroutine count in production
- Profile memory before major releases
- Avoid circular references in long-lived objects
Related Errors
- **fatal error: out of memory**: Heap exhausted, OOM killer invoked
- **runtime: goroutine stack exceeds limit**: Infinite recursion
- **channel closed**: Send on closed channel (different from leak)