Python — AWS Boto3 in automating s3 data uploads
Boto3 is an AWS SDK for Python (Boto3) to create, configure, and manage AWS services.
I first picked up boto3 during my active learning period around 2014–2015-ish. But most enterprises i worked in are not yet on AWS back then.
In the recent 2 years, i chance upon the opportunity to leverage on this to revamp a legacy excel-csv + manual-upload process at work. For the analysts consuming the data, this isn’t visible or valuable to them. But for me, who unfortunately landed in the situation, Boto3 saves up some of my time especially when the influx of legacy manual upload tasks suddenly came like a tsunami in Q3 2022.
With this revamp, the transformation is done in python instead of excel. Next, the upload to s3 is achieved with boto3-python. The use of python is because as an analyst in this organization, i am not granted the permission to use CLI. But we are an organization with analysts at a slightly different level of data maturity, and i can see why it’s hard to entrust more privilege to some of the analysts here.
One can further schedule the python code to run during a specific day of the month. But we have to skip this part for the team at work due to the frequent data adjustment requests and the fact that not all stakeholders respect written SLAs. Because of that, there will be instances where the estimation of future data is required before the actual data is available for upload, and this estimation process is inconsistent across months and not automation-worthy.
Anyway, reading documentations might not be the thing for everyone, so here are some simple snippet to get started.
We would firstly need the s3_uri where the csv file will be uploaded. This will then be broken down to form the bucket and s3_file_key parameters.
# s3_uri
# s3://s3_bucket/folder/subfolder/
Next, it’s module import and variables declaration
# import module
import boto3
import io
# declaring the variables
var_input_file_path=''
var_filename=''
Here’s how a simple upload can be made
# upload
bucket = 's3_bucket'
s3_file_key = 'folder/subfolder/' + var_filename +'.csv'
file_name = var_input_file_path + var_filename +'.csv'
s3 = boto3.resource('s3')
s3.Object(bucket, s3_file_key).put(Body=open(file_name, 'rb'))
And should a bucket emptying procedure is required prior to uploading, this snippet would come in handy for you.
# empty the s3 bucket/folder/
s3_client = boto3.client('s3')
bucket = 's3_bucket'
prefix = 'folder/subfolder/'
response = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix)
for object in response['Contents']:
print('Deleting', object['Key'])
s3_client.delete_object(Bucket=bucket, Key=object['Key'])
And that’s how we upload using AWS Boto3. It’s as simple as that.
Till the next time! Tschüss!
Reference:
https://boto3.amazonaws.com/v1/documentation/api/latest/index.html