Learning about S3's versatility continues as we focus on Amazon S3's multi-part upload feature. Implementing this feature with Boto3
will demonstrate the efficient handling of large files. By the end of this lesson, you will be more adept at managing sizable data sets.
The ability to divide a file into manageable components facilitates a more efficient upload process. This is particularly useful when transferring large files or when experiencing unstable network connectivity. Amazon S3 recommends using multi-part uploads for files larger than 100 MB, but it becomes a requirement for files exceeding 5 GB. Please note that the use of an S3 client, not an S3 resource, is necessary for performing multi-part uploads.
To begin a multi-part upload, you need to create a multi-part upload session. Here is how you initiate a session and define the parts:
Python1import boto3 2import os 3 4# Configure the S3 client 5s3_client = boto3.client('s3') 6bucket_name = 'my_bucket' 7file_path = 'path/to/my_large_file.zip' 8key = 'my_large_file.zip' 9 10# Start a multi-part upload session 11mpu = s3_client.create_multipart_upload(Bucket=bucket_name, Key=key) 12parts = [] 13upload_id = mpu['UploadId']
Next, we calculate the number of parts and read those parts one by one. The size of each part, except the last, should be at least 5 MB and not more than 5 GB.
Python1# Set the part size to 5 MB 2part_size = 1024 * 1024 * 5 3# Retrieve the total file size 4file_size = os.path.getsize(file_path) 5# Calculate the total number of parts needed for the upload 6part_count = (file_size + part_size - 1) // part_size 7 8# Open the file in binary read mode 9with open(file_path, 'rb') as f: 10 # Iterate over each part 11 for part_no in range(1, part_count + 1): 12 # Read the specified part size from the file 13 data = f.read(part_size) 14 # Upload the part to S3 and receive a response containing the ETag 15 response = s3_client.upload_part(Bucket=bucket_name, Key=key, PartNumber=part_no, UploadId=upload_id, Body=data) 16 # Append the part number and ETag to the 'parts' list for later reference in completing the upload 17 parts.append({'PartNumber': part_no, 'ETag': response['ETag']})
After all the parts have been uploaded, you need to indicate that the upload process is complete:
Python1# Complete the multi-part upload 2s3_client.complete_multipart_upload(Bucket=bucket_name, Key=key, UploadId=upload_id, MultipartUpload={'Parts': parts})
We delved into Amazon S3's multi-part upload functionality and executed these features using Boto3
. By understanding multi-part uploads, we can handle the upload of large files more effectively. Expect some fun exercises ahead where we get our hands dirty with multi-part uploads.