This article will show you how to read and write files to AWS S3, using arrow for faster parquet and csv writing, and aws.s3 for all kinds of files.
The best way to setup you environment variables is to declare them inside your Saagie project. This way, it can be modified easily, and your credentials are not stored on git when you use version control on your project.
If this solution is not possible for you, these lines of code will show you how to declare your environment variables directly from your R code (not recommended).
key <- 'BLKIUG450KFBB'
secret <- 'oihKJFuhfuh/953oiof'
region <- 'eu-west-3'
Sys.setenv(AWS_ACCESS_KEY_ID = key, AWS_SECRET_ACCESS_KEY = secret, AWS_DEFAULT_REGION = region)
bucket_name <- 'saagie-service'
object_name <- 'documentation-s3/doc-r/iris.csv'
arrow is a library that can write csv and parquet files, in local and directly to S3.
# Get bucket
bucket <- s3_bucket(bucket_name)
# Create path to file
path <- bucket$path(object_name)
# Write csv file to path created
# Read file from path
iris2 <- read_csv_arrow(path)
aws.s3 is a library that can interact with s3 in different ways. It is slower than arrow but it has more functionalities.
library(data.table) # needed to read/write files from/to RAM
# Upload file from RAM
s3write_using(iris, FUN = fwrite, object = object_name, bucket = bucket_name)
# Read file from S3 to RAM
iris3 <- s3read_using(FUN = fread, object = object_name, bucket = bucket_name)
# List available buckets
# List files in bucket
##### Upload file from disk #####
# Write file to disk
write.csv(iris, 'iris.csv', row.names = F)
put_object(file = 'iris.csv', object = object_name, bucket = bucket_name)
# Alternative way to read to RAM
iris4 <- data.table::fread(rawToChar(get_object(object = object_name, bucket = bucket_name)))
# Read file to disk
# Write binary to disk and then read it. No additional library needed
writeBin(get_object(object = object_name, bucket = bucket_name, as = 'raw'), con = 'iris5.csv')
iris5 <- read.csv('iris5.csv')