Introduction
boto3 S3 multipart uploads split large files into chunks that are uploaded independently and then combined. When network interruptions, memory corruption, or incorrect checksum calculation cause a part's checksum to mismatch, S3 rejects the upload with ChecksumMismatch or EntityTooSmall errors. The default boto3 upload behavior does not automatically retry failed parts, meaning a single corrupted part fails the entire upload of potentially gigabytes of data. Using boto3's S3 Transfer Manager with automatic part retry and checksum validation ensures reliable uploads even on unstable connections.
Symptoms
botocore.exceptions.ClientError: An error occurred (EntityTooSmall) when calling the CompleteMultipartUpload operation: Your proposed upload is smaller than the minimum allowed sizeOr checksum mismatch:
botocore.exceptions.ClientError: An error occurred (BadDigest) when calling the UploadPart operation: The Content-MD5 you specified was invalid.Or:
botocore.exceptions.ClientError: An error occurred (RequestTimeout) when calling the UploadPart operation: Your socket connection to the stable endpoint was not read from or written to within the timeout period.Common Causes
- Network interruption during part upload: Part partially uploaded, checksum mismatch
- Chunk size too small: Parts below 5MB minimum rejected by S3
- Memory corruption during read: File content changed between read and upload
- Default upload method not using multipart: Large files uploaded as single request
- Incomplete multipart upload cleanup: Aborted uploads accumulating storage costs
- Wrong Content-MD5 header: Manually calculated checksum does not match sent data
Step-by-Step Fix
Step 1: Use S3 Transfer Manager with multipart
```python import boto3 from boto3.s3.transfer import TransferConfig
# Configure multipart upload config = TransferConfig( multipart_threshold=8 * 1024 * 1024, # 8MB - start multipart above this multipart_chunksize=8 * 1024 * 1024, # 8MB per part max_concurrency=10, # 10 concurrent uploads use_threads=True, # Use threads for concurrency )
s3 = boto3.client('s3')
s3.upload_file( Filename='large_file.dat', Bucket='my-bucket', Key='data/large_file.dat', Config=config, ExtraArgs={ 'ChecksumAlgorithm': 'SHA256', # Enable S3 checksum validation }, ) ```
Step 2: Handle upload failures with retry
```python from botocore.config import Config from botocore.exceptions import ClientError
# Configure retry in boto3 client s3 = boto3.client( 's3', config=Config( retries={ 'max_attempts': 10, 'mode': 'adaptive', # Exponential backoff with jitter }, ), )
def upload_with_retry(file_path, bucket, key, max_attempts=3): for attempt in range(max_attempts): try: s3.upload_file(file_path, bucket, key) print(f"Upload successful: {key}") return True except ClientError as e: error_code = e.response['Error']['Code'] print(f"Attempt {attempt + 1} failed: {error_code}")
if error_code in ('SlowDown', 'RequestTimeout'): import time time.sleep(2 ** attempt) # Exponential backoff continue elif error_code == 'BadDigest': # File may be corrupted -- verify locally import hashlib with open(file_path, 'rb') as f: sha256 = hashlib.sha256(f.read()).hexdigest() print(f"Local SHA256: {sha256}") continue else: raise return False ```
Step 3: Clean up aborted multipart uploads
```python def cleanup_aborted_uploads(bucket, prefix='', max_age_hours=24): """Remove multipart uploads that have been abandoned.""" import datetime
s3 = boto3.client('s3')
# List incomplete multipart uploads uploads = s3.list_multipart_uploads(Bucket=bucket, Prefix=prefix)
for upload in uploads.get('Uploads', []): initiated = upload['Initiated'] age = datetime.datetime.now(datetime.timezone.utc) - initiated
if age > datetime.timedelta(hours=max_age_hours): s3.abort_multipart_upload( Bucket=bucket, Key=upload['Key'], UploadId=upload['UploadId'], ) print(f"Aborted: {upload['Key']} (age: {age})")
# Run as scheduled task cleanup_aborted_uploads('my-bucket', max_age_hours=24) ```
Prevention
- Use TransferConfig with multipart_chunksize >= 5MB (S3 minimum)
- Enable ChecksumAlgorithm for server-side checksum validation
- Configure adaptive retry mode in botocore for automatic retry of transient errors
- Implement upload lifecycle management to abort stale multipart uploads
- Use S3 Lifecycle rules to automatically expire incomplete multipart uploads after 7 days
- Monitor multipart upload metrics with CloudWatch for failure rate trends
- For very large files (>5GB), consider using S3's multipart upload API directly for more control