In order to run any Spark Scala job on Saagie, you must package your code with the assembly plugin.
Assembly plugin
You'll need the assembly plugin in order to package your code into a fat jar containing all your project dependencies. To avoid having a .jar too heavy, we recommend specifying spark dependencies as "provided" in your build.sbt file (see. example below) as these dependencies are already present in Saagie.
// In build.sbt
import sbt.Keys._
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
// In project/assembly.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.1")
Sbt file
Exemple :
name := "my-spark-application"
version := "0.1"
scalaVersion := "2.11.12"
val SPARK_VERSION = "2.4.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % SPARK_VERSION % "provided",
"org.apache.spark" %% "spark-sql" % SPARK_VERSION % "provided"
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs@_*) =>
xs map {
_.toLowerCase
} match {
case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) => MergeStrategy.discard
case _ => MergeStrategy.discard
}
case "conf/application.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
test in assembly := {}
parallelExecution in Test := false
Comments
0 comments
Article is closed for comments.