This article will show you how to read and write files to S3 using the s3fs library. It allows S3 path directly inside pandas to_csv and others similar methods.
Imports
import pandas as pd
import s3fs
Environment variables
The best way to setup you environment variables is to declare them inside your Saagie project. This way, it can be modified easily, and your credentials are not stored on git when you use version control on your project.
# Ideally, the environment variables need to be set outside of the .py file, inside you Saagie project
# Credentials
key = 'BLKIUG450KFBB'
secret = 'oihKJFuhfuh/953oiof'
region = 'eu-west-3'
# Setup credentials
os.environ['AWS_ACCESS_KEY_ID'] = key
os.environ['AWS_SECRET_ACCESS_KEY'] = secret
os.environ['AWS_DEFAULT_REGION'] = region
Read and write
# Skip to here if you already have setup your environment variables
# File parameters
s3_file = 's3://bucket-name/path/to/file/titanic.csv'
# Import example file
df = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')
##### Write file #####
df.to_csv(s3_file)
##### Read file #####
df_s3 = pd.read_csv(s3_file)
Comments
0 comments
Please sign in to leave a comment.