GitHub Page : exemple-pyspark-read-and-write
Common part
Libraries dependency
from pyspark.sql import SparkSession
Creating Spark Session
sparkSession = SparkSession.builder.appName("example-pyspark-read-and-write").getOrCreate()
How to write a file to HDFS?
Code example
# Create data
data = [('First', 1), ('Second', 2), ('Third', 3), ('Fourth', 4), ('Fifth', 5)]
df = sparkSession.createDataFrame(data)
# Write into HDFS
df.write.csv("hdfs://cluster/user/hdfs/test/example.csv")
How to read a file from HDFS?
Code example
This Code only shows the first 20 records of the file.
# Read from HDFS
df_load = sparkSession.read.csv('hdfs://cluster/user/hdfs/test/example.csv')
df_load.show()
How to use on Saagie?
Please refer to the Python application packaging guidelines
How to use on Saagie's Jupyter Notebooks?
Prior to spark session creation, you must add the following snippet:
import os
os.environ["HADOOP_USER_NAME"] = "hdfs"
os.environ["PYTHON_VERSION"] = "3.5.2"
Comments
0 comments
Article is closed for comments.