# How to Fix Python Memory Error Large File
MemoryError occurs when Python runs out of available RAM while processing large files. This guide shows memory-efficient techniques to handle large datasets.
Error Pattern
MemoryError
Traceback (most recent call last):
File "script.py", line 15, in <module>
data = f.read()
MemoryError: Unable to allocate arrayOr:
MemoryError: Unable to allocate 10.0 GiB for an array with shape (10000, 10000) and data type float64Problematic Code Patterns
Loading Entire File
# DON'T: Load entire file into memory
with open('large_file.txt', 'r') as f:
content = f.read() # MemoryError for large files
lines = content.split('\n')Reading All Lines
# DON'T: Read all lines at once
with open('large_file.txt', 'r') as f:
lines = f.readlines() # Creates list of all lines
for line in lines:
process(line)Creating Large Lists
# DON'T: Create large lists in memory
results = []
for i in range(100_000_000):
results.append(complex_calculation(i)) # MemoryErrorSolutions
Solution 1: Process Line by Line
# DO: Process one line at a time
with open('large_file.txt', 'r') as f:
for line in f:
process(line)Solution 2: Use Chunked Reading
```python def read_in_chunks(file_path, chunk_size=8192): """Read file in chunks of bytes.""" with open(file_path, 'rb') as f: while True: chunk = f.read(chunk_size) if not chunk: break yield chunk
for chunk in read_in_chunks('large_file.bin'): process_chunk(chunk) ```
Solution 3: Use Generators
```python # DON'T: Return list def get_all_records(file_path): records = [] with open(file_path, 'r') as f: for line in f: records.append(parse_record(line)) return records # All in memory
# DO: Use generator def get_records(file_path): with open(file_path, 'r') as f: for line in f: yield parse_record(line) # One at a time
for record in get_records('large_file.csv'): process(record) ```
Solution 4: Process CSV with Pandas Chunks
```python import pandas as pd
# Process CSV in chunks chunk_size = 10000 for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size): process_chunk(chunk) ```
Solution 5: Use Memory-Efficient Data Types
```python import pandas as pd
# Specify dtypes to save memory dtypes = { 'id': 'int32', # Instead of int64 'price': 'float32', # Instead of float64 'category': 'category' # For strings with few unique values }
df = pd.read_csv('large_file.csv', dtype=dtypes) ```
Solution 6: Filter Early
```python import pandas as pd
# Only load needed columns df = pd.read_csv('large_file.csv', usecols=['id', 'name', 'value'])
# Filter rows during read df = pd.read_csv('large_file.csv', usecols=['id', 'status'], skiprows=lambda x: x > 0 and should_skip(x)) ```
Solution 7: Use Memory-Mapped Files
```python import mmap
with open('large_file.txt', 'r') as f: # Create memory-mapped file mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
# Process without loading entire file for line in iter(mm.readline, b''): process(line)
mm.close() ```
Solution 8: Process JSON Files Efficiently
```python import json import ijson # pip install ijson
# DON'T: Load entire JSON with open('large.json', 'r') as f: data = json.load(f) # MemoryError
# DO: Stream JSON with ijson with open('large.json', 'rb') as f: for item in ijson.items(f, 'items.item'): process(item) ```
Solution 9: Use Dask for Large Datasets
```python import dask.dataframe as dd
# Dask handles larger-than-memory datasets ddf = dd.read_csv('large_file.csv') result = ddf.groupby('category').value.sum().compute() ```
Solution 10: Write Output Incrementally
```python # DON'T: Build output in memory output = [] for item in items: output.append(transform(item)) with open('output.txt', 'w') as f: f.write('\n'.join(output))
# DO: Write incrementally with open('output.txt', 'w') as f: for item in items: f.write(transform(item) + '\n') ```
Memory Profiling
Check Memory Usage
```python import psutil import os
process = psutil.Process(os.getpid()) print(f"Memory: {process.memory_info().rss / 1024 / 1024:.2f} MB") ```
Use Memory Profiler
pip install memory-profiler
python -m memory_profiler script.py```python from memory_profiler import profile
@profile def my_function(): # Your code here pass ```
Specialized Libraries
For Large Text Files
```python # Use fileinput for multiple files import fileinput
for line in fileinput.input(['file1.txt', 'file2.txt']): process(line) ```
For Large CSV Files
```python import csv
# Use csv module instead of loading all into memory with open('large.csv', 'r') as f: reader = csv.DictReader(f) for row in reader: process(row) ```
For Large XML Files
```python import xml.etree.ElementTree as ET
# Use iterparse for streaming for event, elem in ET.iterparse('large.xml', events=('end',)): if elem.tag == 'record': process(elem) elem.clear() # Free memory ```
Prevention Tips
- 1.Never load entire large files - always stream or chunk
- 2.Use generators instead of lists for large sequences
- 3.Profile memory usage before deploying to production
- 4.Consider databases for very large datasets (SQLite, PostgreSQL)
- 5.Use appropriate data types (int32 vs int64, float32 vs float64)
- 6.Delete unused variables:
del large_variable
# Force garbage collection
import gc
gc.collect()