Reading and Writing Files in NaaVRE
To read and write files in NaaVRE so they are available both in the Jupyter notebook and in the workflow you should use an external storage. The Jupyter notebook file system and the file system used by cells during the workflow execution are not the same. Therefore, any file created in the Jupyter notebook will not be available in the workflow.
Example
To read or write file in NaaVRE, you should first upload it to the external storage. Then you should write the relevant code in the Jupyter notebook to read the file process it and write the results back to the external storage. For example to read a file from MinIO/S3, in Python you can use the following code:
# Configuration (do not containerize this cell)
param_minio_endpoint = "MINIO_ENDPOINT" # Replace with your MinIO endpoint, e.g., "myhost:9000"
param_minio_user_prefix = "myname@gmail.com" # Your personal folder in the naa-vre-user-data bucket in MinIO
secret_minio_access_key = "MINIO_ACCESSKEY" # Replace with your actual MinIO access key
secret_minio_secret_key = "MINIO_SECRETKEY" # Replace with your actual MinIO secret key
# Access MinIO files
from minio import Minio
mc = Minio(endpoint=param_minio_endpoint,
access_key=secret_minio_access_key,
secret_key=secret_minio_secret_key)
# List existing buckets: get a list of all available buckets
mc.list_buckets()
# Download file from bucket: download `myfile.csv` from your personal folder on MinIO and save it locally as `myfile_downloaded.csv`
mc.fget_object(bucket_name="bucket", object_name=f"{param_minio_user_prefix}/myfile.csv", file_path="myfile_downloaded.csv")
# Process file
# For example, read the CSV file using pandas
# Upload file to bucket: uploads `myfile_local.csv` to your personal folder on MinIO as `myfile.csv`
mc.fput_object(bucket_name="bucket", file_path="myfile_local.csv", object_name=f"{param_minio_user_prefix}/myfile.csv")
Similarly, in R you can use the following code:
# Configuration (do not containerize this cell)
param_minio_endpoint = "MINIO_ENDPOINT" # Replace with your MinIO endpoint, e.g., "myhost:9000"
param_minio_region = "nl-uvalight"
param_minio_user_prefix = "myname@gmail.com"
secret_minio_access_key = "MINIO_ACCESSKEY"
secret_minio_secret_key = "MINIO_SECRETKEY"
# Access MinIO files
install.packages("aws.s3")
library("aws.s3")
Sys.setenv(
"AWS_S3_ENDPOINT" = param_minio_endpoint,
"AWS_DEFAULT_REGION" = param_minio_region,
"AWS_ACCESS_KEY_ID" = secret_minio_access_key,
"AWS_SECRET_ACCESS_KEY" = secret_minio_secret_key
)
# List existing buckets
bucketlist()
# Download file from MinIO
save_object(
bucket = "bucket",
object = paste0(param_minio_user_prefix, "/myfile.csv"),
file = "myfile_downloaded.csv"
)
# Process file
# For example, read the CSV file using readr
# Upload file to MinIO
put_object(
bucket = "bucket",
file = "myfile_local.csv",
object = paste0(param_minio_user_prefix, "/myfile.csv")
)