# How to Fix Python Memory Error Large File

MemoryError occurs when Python runs out of available RAM while processing large files. This guide shows memory-efficient techniques to handle large datasets.

Error Pattern

text
MemoryError
Traceback (most recent call last):
  File "script.py", line 15, in <module>
    data = f.read()
MemoryError: Unable to allocate array

Or:

text
MemoryError: Unable to allocate 10.0 GiB for an array with shape (10000, 10000) and data type float64

Problematic Code Patterns

Loading Entire File

python
# DON'T: Load entire file into memory
with open('large_file.txt', 'r') as f:
    content = f.read()  # MemoryError for large files
    lines = content.split('\n')

Reading All Lines

python
# DON'T: Read all lines at once
with open('large_file.txt', 'r') as f:
    lines = f.readlines()  # Creates list of all lines
    for line in lines:
        process(line)

Creating Large Lists

python
# DON'T: Create large lists in memory
results = []
for i in range(100_000_000):
    results.append(complex_calculation(i))  # MemoryError

Solutions

Solution 1: Process Line by Line

python
# DO: Process one line at a time
with open('large_file.txt', 'r') as f:
    for line in f:
        process(line)

Solution 2: Use Chunked Reading

```python def read_in_chunks(file_path, chunk_size=8192): """Read file in chunks of bytes.""" with open(file_path, 'rb') as f: while True: chunk = f.read(chunk_size) if not chunk: break yield chunk

for chunk in read_in_chunks('large_file.bin'): process_chunk(chunk) ```

Solution 3: Use Generators

```python # DON'T: Return list def get_all_records(file_path): records = [] with open(file_path, 'r') as f: for line in f: records.append(parse_record(line)) return records # All in memory

# DO: Use generator def get_records(file_path): with open(file_path, 'r') as f: for line in f: yield parse_record(line) # One at a time

for record in get_records('large_file.csv'): process(record) ```

Solution 4: Process CSV with Pandas Chunks

```python import pandas as pd

# Process CSV in chunks chunk_size = 10000 for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size): process_chunk(chunk) ```

Solution 5: Use Memory-Efficient Data Types

```python import pandas as pd

# Specify dtypes to save memory dtypes = { 'id': 'int32', # Instead of int64 'price': 'float32', # Instead of float64 'category': 'category' # For strings with few unique values }

df = pd.read_csv('large_file.csv', dtype=dtypes) ```

Solution 6: Filter Early

```python import pandas as pd

# Only load needed columns df = pd.read_csv('large_file.csv', usecols=['id', 'name', 'value'])

# Filter rows during read df = pd.read_csv('large_file.csv', usecols=['id', 'status'], skiprows=lambda x: x > 0 and should_skip(x)) ```

Solution 7: Use Memory-Mapped Files

```python import mmap

with open('large_file.txt', 'r') as f: # Create memory-mapped file mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

# Process without loading entire file for line in iter(mm.readline, b''): process(line)

mm.close() ```

Solution 8: Process JSON Files Efficiently

```python import json import ijson # pip install ijson

# DON'T: Load entire JSON with open('large.json', 'r') as f: data = json.load(f) # MemoryError

# DO: Stream JSON with ijson with open('large.json', 'rb') as f: for item in ijson.items(f, 'items.item'): process(item) ```

Solution 9: Use Dask for Large Datasets

```python import dask.dataframe as dd

# Dask handles larger-than-memory datasets ddf = dd.read_csv('large_file.csv') result = ddf.groupby('category').value.sum().compute() ```

Solution 10: Write Output Incrementally

```python # DON'T: Build output in memory output = [] for item in items: output.append(transform(item)) with open('output.txt', 'w') as f: f.write('\n'.join(output))

# DO: Write incrementally with open('output.txt', 'w') as f: for item in items: f.write(transform(item) + '\n') ```

Memory Profiling

Check Memory Usage

```python import psutil import os

process = psutil.Process(os.getpid()) print(f"Memory: {process.memory_info().rss / 1024 / 1024:.2f} MB") ```

Use Memory Profiler

bash
pip install memory-profiler
python -m memory_profiler script.py

```python from memory_profiler import profile

@profile def my_function(): # Your code here pass ```

Specialized Libraries

For Large Text Files

```python # Use fileinput for multiple files import fileinput

for line in fileinput.input(['file1.txt', 'file2.txt']): process(line) ```

For Large CSV Files

```python import csv

# Use csv module instead of loading all into memory with open('large.csv', 'r') as f: reader = csv.DictReader(f) for row in reader: process(row) ```

For Large XML Files

```python import xml.etree.ElementTree as ET

# Use iterparse for streaming for event, elem in ET.iterparse('large.xml', events=('end',)): if elem.tag == 'record': process(elem) elem.clear() # Free memory ```

Prevention Tips

  1. 1.Never load entire large files - always stream or chunk
  2. 2.Use generators instead of lists for large sequences
  3. 3.Profile memory usage before deploying to production
  4. 4.Consider databases for very large datasets (SQLite, PostgreSQL)
  5. 5.Use appropriate data types (int32 vs int64, float32 vs float64)
  6. 6.Delete unused variables: del large_variable
python
# Force garbage collection
import gc
gc.collect()