Introduction

NumPy arrays are stored in contiguous memory blocks. When creating large arrays, the total memory required (elements * dtype size) may exceed available RAM or the process address space limit. On 32-bit systems, the per-process limit is 2-4 GB. On 64-bit systems, the limit is physical RAM plus swap.

Symptoms

  • numpy.ndarray: MemoryError: Unable to allocate 15.3 GiB for an array with shape (200000, 10000) and data type float64
  • System becomes unresponsive during allocation (thrashing)
  • Process killed by OOM killer on Linux
  • Allocation works on machine A but fails on machine B with less RAM

Common Causes

  • Loading entire datasets into memory without chunking
  • Using float64 when float32 would suffice (2x memory)
  • Creating intermediate arrays during computation that double memory
  • 32-bit Python process hitting 2 GB address space limit
  • Memory fragmentation preventing large contiguous allocation

Step-by-Step Fix

  1. 1.Check current memory usage and array size:
  2. 2.```python
  3. 3.import numpy as np
  4. 4.import psutil

arr_size = 200_000 * 10_000 * 8 # float64 = 8 bytes print(f"Required: {arr_size / 1e9:.1f} GB") mem = psutil.virtual_memory() print(f"Available: {mem.available / 1e9:.1f} GB") ```

  1. 1.Use smaller data types:
  2. 2.```python
  3. 3.# Instead of default float64
  4. 4.arr = np.zeros((200_000, 10_000), dtype=np.float32) # 50% memory reduction
  5. 5.arr = np.zeros((200_000, 10_000), dtype=np.float16) # 75% reduction
  6. 6.arr = np.zeros((200_000, 10_000), dtype=np.int32) # 50% reduction
  7. 7.`
  8. 8.Use memory-mapped files for out-of-core computation:
  9. 9.```python
  10. 10.import numpy as np

# Create array on disk instead of RAM arr = np.memmap('large_array.dat', dtype='float32', mode='w+', shape=(200_000, 10_000)) # Access works like normal array, data stored on disk arr[0, 0] = 3.14 arr.flush() # Ensure data written to disk ```

  1. 1.Process data in chunks:
  2. 2.```python
  3. 3.def process_in_chunks(filename, chunk_size=10_000):
  4. 4.total = np.zeros(10_000, dtype=np.float64)
  5. 5.for start in range(0, 200_000, chunk_size):
  6. 6.end = min(start + chunk_size, 200_000)
  7. 7.chunk = np.load(filename)[start:end] # Load only chunk
  8. 8.total += chunk.sum(axis=0)
  9. 9.del chunk # Free memory immediately
  10. 10.return total
  11. 11.`
  12. 12.Use Dask for distributed out-of-core arrays:
  13. 13.```python
  14. 14.import dask.array as da

# Creates array with chunked computation, not loaded into memory arr = da.zeros((200_000, 10_000), chunks=(10_000, 10_000), dtype=np.float32) result = arr.mean(axis=0).compute() # Only loads chunks needed ```

Prevention

  • Monitor memory with tracemalloc during development
  • Use np.info(arr) to check array memory footprint
  • Prefer generators and chunked processing for large datasets
  • Consider pyarrow for columnar data with efficient memory layout
  • Set resource limits: ulimit -v 8000000 (8 GB virtual memory) to catch issues early
  • Use numba with @njit for computation without creating intermediate arrays