GitHub Page : example-spark-scala-read-and-write-from-hdfs
Common part
sbt Dependencies
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0" % "provided"
Creating Spark Session
val sparkSession = SparkSession.builder().appName("example-spark-scala-read-and-write-from-hdfs").getOrCreate()
How to write a file into HDFS?
Code example
// Defining an Helloworld class
case class HelloWorld(message: String)
// ====== Creating a dataframe with 1 partition
val df = Seq(HelloWorld("helloworld")).toDF().coalesce(1)
// ======= Writing files
// Writing Dataframe as parquet file
df.write.mode(SaveMode.Overwrite).parquet(hdfs_master + "user/hdfs/wiki/testwiki")
// Writing Dataframe as csv file
df.write.mode(SaveMode.Overwrite).csv(hdfs_master + "user/hdfs/wiki/testwiki.csv")
How to read a file from HDFS?
Code example
// ======= Reading files
// Reading parquet files into a Spark Dataframe
val df_parquet = session.read.parquet(hdfs_master + "user/hdfs/wiki/testwiki")
// Reading csv files into a Spark Dataframe
val df_csv = sparkSession.read.option("inferSchema", "true").csv(hdfs_master + "user/hdfs/wiki/testwiki.csv")
Comments
0 comments
Article is closed for comments.