Introduction

Python's multiprocessing module uses fork by default on Unix, which copies the parent process's memory space into child processes. When daemon threads or other active resources exist in the parent at fork time, child processes inherit inconsistent state -- locks that are already held, file descriptors that are open in the parent, and thread-local data that no longer corresponds to actual threads. This causes AssertionError: daemonic processes are not allowed to have children, deadlocks, and corrupted state that is difficult to debug because it only manifests under specific timing conditions.

Symptoms

bash
AssertionError: daemonic processes are not allowed to have children
  File "/usr/lib/python3.11/multiprocessing/process.py", line 123, in start
    assert not _current_process._config.get('daemon'), \

Or silent deadlocks:

bash
# Process hangs indefinitely after fork
# No error message, no output, CPU at 0%

Or corrupted thread state:

bash
RuntimeError: can't start new thread
  File "/usr/lib/python3.11/threading.py", line 912, in _bootstrap
    self._bootstrap_inner()

Common Causes

  • Active threads at fork time: Background threads (logging, monitoring, timers) exist when fork happens
  • Held locks during fork: A mutex is locked in the parent, child inherits it locked forever
  • Daemon threads creating subprocesses: Daemon thread calls Process.start()
  • Global state initialized before fork: Database connections, HTTP clients created at import time
  • Thread pool executors not shut down: concurrent.futures.ThreadPoolExecutor still has workers
  • Using fork on Linux where spawn is needed: Fork is unsafe with multithreaded Python programs

Step-by-Step Fix

Step 1: Use spawn start method

```python import multiprocessing as mp

# Set spawn method BEFORE creating any processes mp.set_start_method('spawn', force=False)

# Or check at runtime def get_safe_start_method(): import sys if sys.platform == 'win32': return 'spawn' # Windows only supports spawn return 'spawn' # Safer for multithreaded programs

# Usage ctx = mp.get_context('spawn') process = ctx.Process(target=worker, args=(data,)) process.start() process.join() ```

Step 2: Clean up resources before forking

```python import threading import multiprocessing as mp

def cleanup_before_fork(): """Shut down non-essential threads before forking.""" # Wait for thread pool to drain if hasattr(threading, '_active'): # Log active non-daemon threads for tid, thread in list(threading._active.items()): if thread is not threading.current_thread(): print(f"Active thread: {thread.name}")

def fork_worker(data): cleanup_before_fork() # Now safe to do work return process_data(data)

# Create pool after all initialization is complete with mp.Pool(processes=4) as pool: results = pool.map(fork_worker, data_items) ```

Step 3: Use initializer to set up clean child state

```python import multiprocessing as mp

# Module-level state that children will initialize _db_connection = None _http_client = None

def init_worker(): """Initialize resources cleanly in each child process.""" global _db_connection, _http_client # Each child creates its own connections _db_connection = create_db_connection() _http_client = create_http_client() print(f"Worker {mp.current_process().name} initialized")

def worker_task(item): # Uses per-process resources, not inherited ones result = _db_connection.query(item) return _http_client.post("/results", json=result)

if __name__ == '__main__': ctx = mp.get_context('spawn') with ctx.Pool(processes=4, initializer=init_worker) as pool: results = pool.map(worker_task, items) ```

Prevention

  • Always use spawn start method on Linux for multithreaded applications
  • Initialize multiprocessing pools after all application startup is complete
  • Use initializer functions to set up per-process resources cleanly
  • Never create threads at module import time -- lazy-initialize them
  • Shut down thread pools and close connections before creating process pools
  • Add if __name__ == '__main__': guard around all process creation code
  • Test multiprocessing code on both Linux and macOS to catch fork/spawn differences