Introduction
scikit-learn models are serialized with pickle/joblib, which captures the internal state of the model object including private attributes and internal data structures. When the scikit-learn version used to load a model differs from the version that saved it, deserialization can fail with ValueError: buffer source array is read-only, AttributeError: module 'sklearn' has no attribute, or produce silently incorrect predictions. This is a critical production issue because model loading failures block inference pipelines and version mismatches can produce wrong predictions without any error.
Symptoms
ValueError: buffer source array is read-only
File "sklearn/tree/_tree.pyx", line 716, in sklearn.tree._tree.Tree.__setstate__Or:
AttributeError: Can't get attribute '_n_features_in_' on <module 'sklearn.base'>
File "/usr/lib/python3.11/pickle.py", line 1234, in load_newobjOr silent prediction errors:
# Model loaded without error but predictions are wrong
model.predict(X) # Returns different values than when model was trainedCommon Causes
- scikit-learn version mismatch: Model saved with 1.2, loaded with 1.4
- Python version difference: Pickle protocol differences between Python versions
- Custom pipeline components: Custom transformers not importable at load time
- Feature count mismatch: Model trained with different number of features
- Corrupted model file: Incomplete save or file transfer corruption
- NumPy version incompatibility: NumPy array format changes between versions
Step-by-Step Fix
Step 1: Pin and verify versions during load
```python import joblib import sklearn import json
# Save model WITH version metadata def save_model(model, path): joblib.dump(model, path) # Save version info alongside with open(path + '.meta', 'w') as f: json.dump({ 'sklearn_version': sklearn.__version__, 'python_version': platform.python_version(), 'n_features': model.n_features_in_ if hasattr(model, 'n_features_in_') else None, }, f)
# Load model WITH version verification def load_model_safe(path): with open(path + '.meta') as f: meta = json.load(f)
saved_version = meta['sklearn_version'] current_version = sklearn.__version__
if saved_version != current_version: print(f"WARNING: Model saved with sklearn {saved_version}, " f"loading with {current_version}")
# Check major version compatibility saved_major = int(saved_version.split('.')[0]) current_major = int(current_version.split('.')[0]) if saved_major != current_major: raise ValueError( f"Incompatible sklearn versions: {saved_version} vs {current_version}" )
model = joblib.load(path) return model ```
Step 2: Use ONNX for cross-version model export
```python # Convert sklearn model to ONNX for version-independent serving # pip install sklearn-onnx
from skl2onnx import convert_sklearn from skl2onnx.common.data_types import FloatTensorType
def export_model_onnx(sklearn_model, output_path, n_features): initial_type = [('input', FloatTensorType([None, n_features]))] onnx_model = convert_sklearn(sklearn_model, initial_types=initial_type)
with open(output_path, 'wb') as f: f.write(onnx_model.SerializeToString())
# Load ONNX model (independent of sklearn version) # pip install onnxruntime import onnxruntime as ort
session = ort.InferenceSession('model.onnx') predictions = session.run(None, {'input': X.astype('float32')})[0] ```
Step 3: Containerize model with dependencies
```dockerfile # Dockerfile for model serving FROM python:3.11-slim
# Pin exact versions COPY requirements.txt . RUN pip install --no-cache-dir \ scikit-learn==1.4.0 \ numpy==1.26.3 \ joblib==1.3.2
# Copy model COPY model.pkl /app/model.pkl COPY model.meta /app/model.meta
CMD ["python", "serve.py"] ```
Prevention
- Always save model version metadata alongside pickle files
- Use ONNX export for models that need to be served across version boundaries
- Containerize model inference with pinned dependency versions
- Add version compatibility checks to model loading code
- Maintain a model registry that tracks training environment and dependencies
- Test model loading with the exact versions used in production before deployment
- For critical models, save predictions on a validation set alongside the model for integrity checks