Introduction
Python's multiprocessing module uses fork by default on Unix, which copies the parent process's memory space into child processes. When daemon threads or other active resources exist in the parent at fork time, child processes inherit inconsistent state -- locks that are already held, file descriptors that are open in the parent, and thread-local data that no longer corresponds to actual threads. This causes AssertionError: daemonic processes are not allowed to have children, deadlocks, and corrupted state that is difficult to debug because it only manifests under specific timing conditions.
Symptoms
AssertionError: daemonic processes are not allowed to have children
File "/usr/lib/python3.11/multiprocessing/process.py", line 123, in start
assert not _current_process._config.get('daemon'), \Or silent deadlocks:
# Process hangs indefinitely after fork
# No error message, no output, CPU at 0%Or corrupted thread state:
RuntimeError: can't start new thread
File "/usr/lib/python3.11/threading.py", line 912, in _bootstrap
self._bootstrap_inner()Common Causes
- Active threads at fork time: Background threads (logging, monitoring, timers) exist when fork happens
- Held locks during fork: A mutex is locked in the parent, child inherits it locked forever
- Daemon threads creating subprocesses: Daemon thread calls Process.start()
- Global state initialized before fork: Database connections, HTTP clients created at import time
- Thread pool executors not shut down: concurrent.futures.ThreadPoolExecutor still has workers
- Using fork on Linux where spawn is needed: Fork is unsafe with multithreaded Python programs
Step-by-Step Fix
Step 1: Use spawn start method
```python import multiprocessing as mp
# Set spawn method BEFORE creating any processes mp.set_start_method('spawn', force=False)
# Or check at runtime def get_safe_start_method(): import sys if sys.platform == 'win32': return 'spawn' # Windows only supports spawn return 'spawn' # Safer for multithreaded programs
# Usage ctx = mp.get_context('spawn') process = ctx.Process(target=worker, args=(data,)) process.start() process.join() ```
Step 2: Clean up resources before forking
```python import threading import multiprocessing as mp
def cleanup_before_fork(): """Shut down non-essential threads before forking.""" # Wait for thread pool to drain if hasattr(threading, '_active'): # Log active non-daemon threads for tid, thread in list(threading._active.items()): if thread is not threading.current_thread(): print(f"Active thread: {thread.name}")
def fork_worker(data): cleanup_before_fork() # Now safe to do work return process_data(data)
# Create pool after all initialization is complete with mp.Pool(processes=4) as pool: results = pool.map(fork_worker, data_items) ```
Step 3: Use initializer to set up clean child state
```python import multiprocessing as mp
# Module-level state that children will initialize _db_connection = None _http_client = None
def init_worker(): """Initialize resources cleanly in each child process.""" global _db_connection, _http_client # Each child creates its own connections _db_connection = create_db_connection() _http_client = create_http_client() print(f"Worker {mp.current_process().name} initialized")
def worker_task(item): # Uses per-process resources, not inherited ones result = _db_connection.query(item) return _http_client.post("/results", json=result)
if __name__ == '__main__': ctx = mp.get_context('spawn') with ctx.Pool(processes=4, initializer=init_worker) as pool: results = pool.map(worker_task, items) ```
Prevention
- Always use
spawnstart method on Linux for multithreaded applications - Initialize multiprocessing pools after all application startup is complete
- Use initializer functions to set up per-process resources cleanly
- Never create threads at module import time -- lazy-initialize them
- Shut down thread pools and close connections before creating process pools
- Add
if __name__ == '__main__':guard around all process creation code - Test multiprocessing code on both Linux and macOS to catch fork/spawn differences