Fix boto3 S3 Multipart Upload Failed Retry

Introduction

S3 multipart uploads fail in production for several reasons: network interruptions during part upload, part checksum mismatches, timeout on large parts, or S3 throttling when uploading many parts concurrently. Unlike single PUT uploads, multipart uploads require managing upload state across multiple HTTP requests. If any part fails, the entire upload must be either retried from that part or aborted and restarted. Without proper retry configuration, boto3 gives up after the default retry limit, leaving incomplete multipart uploads that consume S3 storage and incur costs indefinitely.

Symptoms

bash

botocore.exceptions.ClientError: An error occurred (RequestTimeout) when calling the UploadPart operation
  (reached max retries: 4): Your socket connection to the server was not read from or written to within the timeout period.

Or:

bash

botocore.exceptions.ClientError: An error occurred (SlowDown) when calling the UploadPart operation
  (reached max retries: 4): Please reduce your request rate.

Orphaned multipart uploads detected by lifecycle policy:

bash

$ aws s3api list-multipart-uploads --bucket my-bucket
{
    "Uploads": [
        {
            "Key": "data/export-2024-03-15.csv",
            "UploadId": "abc123",
            "Initiated": "2024-03-15T10:00:00.000Z",
            "StorageClass": "STANDARD"
        }
    ]
}

Common Causes

Default retry configuration too conservative: botocore default retries do not cover all S3 error codes
Part size too large: 100MB+ parts take too long to upload and hit the socket timeout
Too many concurrent part uploads: Exceeding S3 per-prefix rate limits triggers SlowDown errors
Incomplete upload not aborted: Failed uploads are never cleaned up, accumulating storage costs
Network instability on EC2: Instance network throughput fluctuation causes part upload timeouts
Missing idempotency: Retry uploads the same file creating duplicate keys

Step-by-Step Fix

Step 1: Use TransferConfig with proper retry and part settings

```python import boto3 from boto3.s3.transfer import TransferConfig from botocore.config import Config

# Configure botocore retries with custom policy config = Config( retries={ "max_attempts": 10, "mode": "adaptive", # Standard or adaptive } )

s3_client = boto3.client("s3", config=config)

# Configure multipart transfer transfer_config = TransferConfig( multipart_threshold=50 * 1024 * 1024, # 50MB - use multipart above this multipart_chunksize=25 * 1024 * 1024, # 25MB per part (smaller = more resumable) max_concurrency=10, # Concurrent part uploads use_threads=True, ) ```

The adaptive retry mode adds client-side throttling in addition to exponential backoff, which is essential for S3 SlowDown responses.

Step 2: Implement resumable upload with abort cleanup

```python import logging

logger = logging.getLogger(__name__)

def upload_file_with_cleanup(bucket, key, filepath, transfer_config): """Upload file and abort multipart upload on failure.""" try: s3_client.upload_file( Filename=filepath, Bucket=bucket, Key=key, Config=transfer_config, ExtraArgs={"ServerSideEncryption": "aws:kms"}, ) logger.info("Successfully uploaded %s to s3://%s/%s", filepath, bucket, key) except Exception as exc: logger.error("Upload failed for %s: %s", filepath, exc) abort_incomplete_uploads(bucket, key, max_age_hours=1) raise

def abort_incomplete_uploads(bucket, key, max_age_hours=1): """Abort multipart uploads older than max_age_hours for this key.""" from datetime import datetime, timedelta, timezone

cutoff = datetime.now(timezone.utc) - timedelta(hours=max_age_hours)

response = s3_client.list_multipart_uploads(Bucket=bucket, Prefix=key) for upload in response.get("Uploads", []): if upload["Initiated"] < cutoff: s3_client.abort_multipart_upload( Bucket=bucket, Key=upload["Key"], UploadId=upload["UploadId"], ) logger.info( "Aborted stale multipart upload: %s (id: %s)", upload["Key"], upload["UploadId"], ) ```

Step 3: Configure S3 lifecycle rule to auto-abort incomplete uploads

Set up a bucket lifecycle rule to automatically abort multipart uploads older than a threshold:

bash

aws s3api put-bucket-lifecycle-configuration \
    --bucket my-bucket \
    --lifecycle-configuration '{
        "Rules": [
            {
                "ID": "AbortIncompleteMultipartUploads",
                "Filter": {},
                "Status": "Enabled",
                "AbortIncompleteMultipartUpload": {
                    "DaysAfterInitiation": 1
                }
            }
        ]
    }'

This ensures that even if your application fails to clean up, S3 automatically aborts uploads after 1 day.

Prevention

Use multipart_chunksize=25MB for files up to 5GB; increase to 100MB for very large files
Set max_concurrency based on available network bandwidth (each concurrent part uses bandwidth)
Enable S3 Transfer Acceleration for cross-region uploads
Add CloudWatch alarms on 4xx and 5xx error rates for the S3 bucket
Run a daily cron job to list and abort any multipart uploads older than 24 hours
Use boto3.set_stream_logger("botocore", logging.DEBUG) to debug retry behavior in staging

Fix boto3 S3 Multipart Upload Failed Retry

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Step 1: Use TransferConfig with proper retry and part settings

Step 2: Implement resumable upload with abort cleanup

Step 3: Configure S3 lifecycle rule to auto-abort incomplete uploads

Prevention

Share this guide

More Python Troubleshooting Guides

Python Unit Test Error

Python Argparse Error

Python Logging Configuration Error

Python URLLIB Error

Python Requests Timeout Error

Python FastAPI Validation Error