Introduction

Go goroutine leak and channel blocking errors occur when goroutines are spawned but never terminate, consuming memory and system resources until the application crashes or becomes unresponsive. Goroutine leaks happen when a goroutine blocks indefinitely on channel operations, waiting on a condition that never becomes true, or failing to respond to context cancellation. Channel blocking occurs when sends block without receivers or receives block without senders, causing goroutine deadlock or resource exhaustion. Common causes include unbuffered channel send without receiver, buffered channel full causing send block, channel receive in select without default or timeout, goroutine spawned but parent exits, context not propagated or not checked, worker pool workers not terminated, mutex deadlock from incorrect lock ordering, WaitGroup not properly decremented, and background goroutines without shutdown mechanism. The fix requires understanding Go concurrency patterns, channel semantics, context propagation, proper goroutine lifecycle management, and leak detection tools. This guide provides production-proven techniques for preventing and fixing goroutine leaks across web services, background workers, and concurrent Go applications.

Symptoms

  • Application memory grows continuously over time
  • fatal error: all goroutines are asleep - deadlock!
  • Goroutine count increases without bound
  • goroutine profile: total N goroutines keeps growing
  • Application becomes unresponsive but doesn't crash
  • Context deadline exceeded from blocked operations
  • Channel send/receive blocks indefinitely
  • WaitGroup.Wait() never returns
  • Worker pool stops processing tasks
  • pprof shows goroutines stuck in channel operations

Common Causes

  • Send on unbuffered channel with no receiver ready
  • Send on buffered channel that is full
  • Receive from channel that will never receive value
  • Goroutine spawned without exit condition
  • Context not checked in long-running goroutine
  • Mutex held while waiting on channel (deadlock)
  • WaitGroup.Add() without corresponding Done()
  • Worker pool not draining channel on shutdown
  • Select without default or timeout blocking forever
  • Background goroutine orphaned when parent exits

Step-by-Step Fix

### 1. Detect goroutine leaks

Use goleak for leak detection:

```go // Test with goleak import ( "testing" "go.uber.org/goleak" )

func TestMain(m *testing.M) { goleak.VerifyTestMain(m) }

// Or per-test func TestMyFunction(t *testing.T) { defer goleak.VerifyNone(t)

// Test code that might leak go func() { // Potential leak }() }

// Find specific leak func TestLeak(t *testing.T) { defer goleak.VerifyNone( t, goleak.IgnoreTopFunction("internal/poll.runtime_pollWait"), goleak.IgnoreCurrent(), // Ignore existing goroutines ) } ```

Install goleak: ``bash go get go.uber.org/goleak

Use pprof for goroutine profiling:

```go // Enable pprof in application import ( _ "net/http/pprof" "net/http" )

func main() { // Start pprof server go http.ListenAndServe("localhost:6060", nil)

// Application code } ```

Analyze goroutine profile:

```bash # Capture goroutine profile go tool pprof http://localhost:6060/debug/pprof/goroutine

# Or save to file curl -o goroutines.pprof http://localhost:6060/debug/pprof/goroutine?debug=0 go tool pprof goroutines.pprof

# View in browser go tool pprof -http=:8080 http://localhost:6060/debug/pprof/goroutine

# In pprof interactive mode: # goroutines - Show all goroutines # top - Show top functions by goroutine count # list FuncName - Show goroutines in specific function # truss - Show call graph ```

Manual goroutine count monitoring:

```go // Monitor goroutine count import ( "runtime" "time" "log" )

func monitorGoroutines() { ticker := time.NewTicker(10 * time.Second) defer ticker.Stop()

var lastCount int for range ticker.C { count := runtime.NumGoroutine() if count > lastCount+10 { // Significant increase log.Printf("Goroutine count increased: %d -> %d", lastCount, count) } lastCount = count } } ```

### 2. Fix channel blocking issues

Unbuffered channel blocking:

```go // WRONG: Send blocks forever (no receiver) func leak() { ch := make(chan int) // Unbuffered go func() { ch <- 42 // Blocks until someone receives }() // Goroutine leaked - nobody receives }

// CORRECT: Use buffered channel or ensure receiver exists func fixed() { ch := make(chan int, 1) // Buffered go func() { ch <- 42 // Won't block (buffer available) }()

// Or ensure receiver val := <-ch }

// CORRECT: Use select with timeout func withTimeout() { ch := make(chan int) done := make(chan bool)

go func() { select { case ch <- 42: case <-time.After(1 * time.Second): log.Println("Send timed out") case <-done: return } }()

close(done) // Signal goroutine to exit } ```

Buffered channel full:

```go // WRONG: Channel full causes block func leak() { ch := make(chan int, 2) ch <- 1 ch <- 2 // Channel is full go func() { ch <- 3 // Blocks forever - channel full, no receiver }() }

// CORRECT: Non-blocking send func nonBlockingSend() { ch := make(chan int, 2) ch <- 1 ch <- 2

select { case ch <- 3: // Try to send default: // Channel full, don't block log.Println("Channel full, dropping message") } }

// CORRECT: Send with timeout func sendWithTimeout(ch chan<- int, val int) error { select { case ch <- val: return nil case <-time.After(5 * time.Second): return fmt.Errorf("send timeout") } } ```

### 3. Fix goroutine lifecycle with context

Proper context propagation:

```go // WRONG: Goroutine doesn't respond to cancellation func leak() { go func() { for { doWork() // Runs forever } }() }

// CORRECT: Check context in goroutine func fixed(ctx context.Context) { go func() { for { select { case <-ctx.Done(): return // Exit on cancellation default: doWork() } } }() }

// Usage ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel() fixed(ctx) ```

Context with channel operations:

```go // WRONG: Channel receive ignores cancellation func leak(ctx context.Context, ch <-chan int) { go func() { val := <-ch // Blocks even if ctx cancelled process(val) }() }

// CORRECT: Select with context func fixed(ctx context.Context, ch <-chan int) { go func() { select { case val := <-ch: process(val) case <-ctx.Done(): return // Exit on cancellation } }() }

// Send with context func send(ctx context.Context, ch chan<- int, val int) error { select { case ch <- val: return nil case <-ctx.Done(): return ctx.Err() } } ```

### 4. Fix worker pool leaks

Worker pool with proper shutdown:

```go // WRONG: Workers never terminate func leak() { jobs := make(chan Job)

// Start workers for i := 0; i < 10; i++ { go func() { for job := range jobs { process(job) } }() }

// Send jobs for _, job := range jobs { jobs <- job } // Channel never closed, workers block forever }

// CORRECT: Close channel to signal shutdown func fixed(jobs []Job) { jobChan := make(chan Job, 100) var wg sync.WaitGroup

// Start workers for i := 0; i < 10; i++ { wg.Add(1) go func() { defer wg.Done() for job := range jobChan { process(job) } }() }

// Send jobs for _, job := range jobs { jobChan <- job } close(jobChan) // Signal workers to exit

wg.Wait() // Wait for all workers to finish }

// CORRECT: Worker pool with context type WorkerPool struct { ctx context.Context cancel context.CancelFunc jobs chan Job wg sync.WaitGroup }

func NewWorkerPool(size int) *WorkerPool { ctx, cancel := context.WithCancel(context.Background()) wp := &WorkerPool{ ctx: ctx, cancel: cancel, jobs: make(chan Job, size), }

// Start workers for i := 0; i < 10; i++ { wp.wg.Add(1) go wp.worker() }

return wp }

func (wp *WorkerPool) worker() { defer wp.wg.Done() for { select { case <-wp.ctx.Done(): return case job := <-wp.jobs: process(job) } } }

func (wp *WorkerPool) Submit(job Job) error { select { case wp.jobs <- job: return nil case <-wp.ctx.Done(): return fmt.Errorf("pool shutting down") } }

func (wp *WorkerPool) Shutdown() { wp.cancel() // Signal workers to stop wp.wg.Wait() // Wait for workers to finish } ```

### 5. Fix WaitGroup issues

WaitGroup without Done():

```go // WRONG: WaitGroup never decremented func leak() { var wg sync.WaitGroup wg.Add(1)

go func() { doWork() // Missing wg.Done() - Wait() blocks forever }()

wg.Wait() // Blocks forever }

// CORRECT: Always call Done func fixed() { var wg sync.WaitGroup wg.Add(1)

go func() { defer wg.Done() // Ensures Done() is called doWork() }()

wg.Wait() }

// CORRECT: Use Add(1) right before goroutine func safe() { var wg sync.WaitGroup

for i := 0; i < 10; i++ { wg.Add(1) // Add before starting goroutine go func() { defer wg.Done() doWork() }() }

wg.Wait() } ```

WaitGroup with error handling:

```go type Worker struct { wg sync.WaitGroup errs chan error }

func (w *Worker) Do(ctx context.Context) error { w.wg.Add(1) go func() { defer w.wg.Done() if err := doWork(ctx); err != nil { select { case w.errs <- err: case <-ctx.Done(): } } }() return nil }

func (w *Worker) Wait() error { w.wg.Wait() close(w.errs)

// Return first error for err := range w.errs { if err != nil { return err } } return nil } ```

### 6. Fix mutex deadlocks

Mutex with channel deadlock:

```go // WRONG: Hold mutex while waiting on channel type Cache struct { mu sync.Mutex data map[string]string }

func (c *Cache) GetOrFetch(ctx context.Context, key string) string { c.mu.Lock() defer c.mu.Unlock()

if val, ok := c.data[key]; ok { return val }

// WRONG: Holding mutex while fetching val := fetchFromNetwork(ctx) // This might block c.data[key] = val return val }

// CORRECT: Don't hold mutex during blocking operation func (c *Cache) GetOrFetch(ctx context.Context, key string) string { // First check with lock c.mu.Lock() if val, ok := c.data[key]; ok { c.mu.Unlock() return val } c.mu.Unlock()

// Fetch without holding lock val, err := fetchFromNetwork(ctx) if err != nil { return "" }

// Store result with lock c.mu.Lock() c.data[key] = val c.mu.Unlock()

return val } ```

Mutex ordering to prevent deadlock:

```go // WRONG: Inconsistent lock ordering var mu1, mu2 sync.Mutex

func transferAtoB() { mu1.Lock() mu2.Lock() // transfer mu2.Unlock() mu1.Unlock() }

func transferBtoA() { mu2.Lock() // Different order - deadlock! mu1.Lock() // transfer mu1.Unlock() mu2.Unlock() }

// CORRECT: Always lock in same order func transferBtoA() { mu1.Lock() // Same order as transferAtoB mu2.Lock() // transfer mu2.Unlock() mu1.Unlock() } ```

### 7. Debug with goroutine dumps

Capture goroutine stack traces:

```go import ( "runtime" "os" )

func dumpGoroutines() { buf := make([]byte, 1<<20) // 1MB buffer n := runtime.Stack(buf, true) // true = all goroutines os.Stderr.Write(buf[:n]) }

// Or write to file func dumpGoroutinesToFile(path string) error { buf := make([]byte, 1<<20) n := runtime.Stack(buf, true)

f, err := os.Create(path) if err != nil { return err } defer f.Close()

_, err = f.Write(buf[:n]) return err }

// Call on panic recovery defer func() { if r := recover(); r != nil { dumpGoroutinesToFile("/tmp/goroutine-dump.txt") log.Printf("Panic: %v", r) } }() ```

Signal handler for goroutine dump:

```go import ( "os/signal" "syscall" )

func setupGoroutineDump() { sigCh := make(chan os.Signal, 1) signal.Notify(sigCh, syscall.SIGUSR1)

go func() { for range sigCh { log.Println("Dumping goroutines...") dumpGoroutines() } }() }

// Usage: kill -SIGUSR1 <pid> to trigger dump ```

### 8. Prevent leaks with patterns

Bounded goroutine pattern:

```go // Pattern: Goroutine with timeout and cancellation func boundedOperation(ctx context.Context) error { ctx, cancel := context.WithTimeout(ctx, 30*time.Second) defer cancel()

result := make(chan Result, 1) errCh := make(chan error, 1)

go func() { defer func() { if r := recover(); r != nil { errCh <- fmt.Errorf("panic: %v", r) } }()

res, err := doWork(ctx) if err != nil { errCh <- err return } result <- res }()

select { case res := <-result: return processResult(res) case err := <-errCh: return err case <-ctx.Done(): return ctx.Err() } } ```

Fan-out/fan-in pattern:

```go // Fan-out: Distribute work to multiple workers // Fan-in: Collect results from workers func fanOutFanIn(ctx context.Context, inputs []Input) ([]Result, error) { // Fan-out inCh := make(chan Input, len(inputs)) for _, input := range inputs { inCh <- input } close(inCh)

// Start workers var wg sync.WaitGroup outCh := make(chan Result, len(inputs)) errCh := make(chan error, 10)

for i := 0; i < 10; i++ { wg.Add(1) go func() { defer wg.Done() for input := range inCh { result, err := process(ctx, input) if err != nil { select { case errCh <- err: default: } return } outCh <- result } }() }

// Close output when all workers done go func() { wg.Wait() close(outCh) close(errCh) }()

// Fan-in: collect results var results []Result for { select { case result, ok := <-outCh: if !ok { return results, nil } results = append(results, result) case err := <-errCh: if err != nil { return nil, err } case <-ctx.Done(): return nil, ctx.Err() } } } ```

Prevention

  • Always use context.WithTimeout for long-running operations
  • Check context.Done() in select statements for all goroutines
  • Use buffered channels when appropriate to prevent blocking
  • Close channels to signal shutdown to receivers
  • Use defer for WaitGroup.Done() and mutex.Unlock()
  • Implement graceful shutdown in all worker pools
  • Monitor goroutine count in production with alerts
  • Use goleak in tests to catch leaks early
  • Capture goroutine dumps on panic for debugging
  • **fatal error: all goroutines are asleep - deadlock!**: No runnable goroutines
  • **context deadline exceeded**: Operation timed out
  • **channel closed**: Receive from closed channel
  • **panic: send on closed channel**: Send to closed channel
  • **goroutine profile: total N**: Excessive goroutine count