Packaging and Launching Scala Spark Project with Maven

I am writing an application in Scala that uses Spark . I am packing my application using Maven and am having trouble creating a "uber" or "fat" jar .

The problem I am facing is that running the application works fine inside the IDE or if I provide a non-uber-jar'd version of the dependencies as the java class path, but this does not work if I give the uber jar as the class path , i.e.

java -Xmx2G -cp target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar debug.spark_example.Example data.txt 

does not work. The following error message appears:

 ERROR SparkContext: Error initializing SparkContext. com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version' at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164) at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206) at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:168) at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:504) at akka.actor.ActorSystem$.apply(ActorSystem.scala:141) at akka.actor.ActorSystem$.apply(ActorSystem.scala:118) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56) at org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:245) at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:52) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:247) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267) at org.apache.spark.SparkContext.<init>(SparkContext.scala:424) at debug.spark_example.Example$.main(Example.scala:9) at debug.spark_example.Example.main(Example.scala) 

I am very grateful for the understanding of what I need to add to the pom.xml file, and why I need to add it to make this work.

I searched on the Internet and found the following resources that I tried (see in pom) but could not work:

1) Spark User mailing list: http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-td5615.html

2) how to pack a Scala spark application

I have a simple example demonstrating this problem, a simple class 1 project (src / main / scala / debug / spark_example / Example.scala):

 package debug.spark_example import org.apache.spark.{SparkConf, SparkContext} object Example { def main(args: Array[String]): Unit = { val sc = new SparkContext(new SparkConf().setAppName("Test").setMaster("local[2]")) val lines = sc.textFile(args(0)) val lineLengths = lines.map(s => s.length) val totalLength = lineLengths.reduce((a, b) => a + b) lineLengths.foreach(println) println(totalLength) } } 

Here is the pom.xml file:

 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>debug.spark-example</groupId> <artifactId>spark-example</artifactId> <version>0.1-SNAPSHOT</version> <inceptionYear>2015</inceptionYear> <properties> <scala.majorVersion>2.11</scala.majorVersion> <scala.minorVersion>.2</scala.minorVersion> <spark.version>1.4.1</spark.version> </properties> <repositories> <repository> <id>scala-tools.org</id> <name>Scala-Tools Maven2 Repository</name> <url>http://scala-tools.org/repo-releases</url> </repository> </repositories> <pluginRepositories> <pluginRepository> <id>scala-tools.org</id> <name>Scala-Tools Maven2 Repository</name> <url>http://scala-tools.org/repo-releases</url> </pluginRepository> </pluginRepositories> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.majorVersion}${scala.minorVersion}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.majorVersion}</artifactId> <version>${spark.version}</version> </dependency> </dependencies> <build> <sourceDirectory>src/main/scala</sourceDirectory> <plugins> <plugin> <groupId>org.scala-tools</groupId> <artifactId>maven-scala-plugin</artifactId> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-eclipse-plugin</artifactId> <configuration> <downloadSources>true</downloadSources> <buildcommands> <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand> </buildcommands> <additionalProjectnatures> <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature> </additionalProjectnatures> <classpathContainers> <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer> <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer> </classpathContainers> </configuration> </plugin> <plugin> <artifactId>maven-assembly-plugin</artifactId> <version>2.4</version> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>attached</goal> </goals> </execution> </executions> <configuration> <tarLongFileMode>gnu</tarLongFileMode> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.2</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <minimizeJar>false</minimizeJar> <createDependencyReducedPom>false</createDependencyReducedPom> <artifactSet> <includes> <!-- Include here the dependencies you want to be packed in your fat jar --> <include>*:*</include> </includes> </artifactSet> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>reference.conf</resource> </transformer> </transformers> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.7</version> <configuration> <skipTests>true</skipTests> </configuration> </plugin> </plugins> </build> <reporting> <plugins> <plugin> <groupId>org.scala-tools</groupId> <artifactId>maven-scala-plugin</artifactId> </plugin> </plugins> </reporting> </project> 

Many thanks for your help.

+7
scala maven akka apache-spark
source share
3 answers

It seems that Spark submit script should be used to run the program.

Instead

 java -Xmx2G -cp target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar debug.spark_example.Example data.txt 

Do something like:

 <path-to>/spark-1.4.1/bin/spark-submit --class debug.spark_example.Example --master local[2] target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar data.txt 

It also works without a shaded can; with only jar-with-dependencies. The following pom.xml file worked for me:

 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>debug.spark-example</groupId> <artifactId>spark-example</artifactId> <version>0.1-SNAPSHOT</version> <inceptionYear>2015</inceptionYear> <properties> <scala.majorVersion>2.11</scala.majorVersion> <scala.minorVersion>.2</scala.minorVersion> <spark.version>1.4.1</spark.version> </properties> <repositories> <repository> <id>scala-tools.org</id> <name>Scala-Tools Maven2 Repository</name> <url>http://scala-tools.org/repo-releases</url> </repository> </repositories> <pluginRepositories> <pluginRepository> <id>scala-tools.org</id> <name>Scala-Tools Maven2 Repository</name> <url>http://scala-tools.org/repo-releases</url> </pluginRepository> </pluginRepositories> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.majorVersion}${scala.minorVersion}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.majorVersion}</artifactId> <version>${spark.version}</version> </dependency> </dependencies> <build> <sourceDirectory>src/main/scala</sourceDirectory> <plugins> <plugin> <groupId>org.scala-tools</groupId> <artifactId>maven-scala-plugin</artifactId> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-eclipse-plugin</artifactId> <configuration> <downloadSources>true</downloadSources> <buildcommands> <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand> </buildcommands> <additionalProjectnatures> <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature> </additionalProjectnatures> <classpathContainers> <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer> <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer> </classpathContainers> </configuration> </plugin> <plugin> <artifactId>maven-assembly-plugin</artifactId> <version>2.4</version> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>attached</goal> </goals> </execution> </executions> <configuration> <tarLongFileMode>gnu</tarLongFileMode> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.7</version> <configuration> <skipTests>true</skipTests> </configuration> </plugin> </plugins> </build> <reporting> <plugins> <plugin> <groupId>org.scala-tools</groupId> <artifactId>maven-scala-plugin</artifactId> </plugin> </plugins> </reporting> </project> 
+3
source share

This may have something to do with the order of your maven plugins. You use the maven-assembly-plugin and maven-shade-plugin plugins in your project, which are associated with the same phase in the maven life cycle. When this happens, maven executes the plugins in the order in which they appear in the plugins section, so in your case it executes the build plugin, then the plugin with a touch.

Based on the output jar you are trying to launch and the shadow transform you have, you probably need the opposite order. However, you may not even need the build plugin for your use case. You might be able to use the target/spark-example-0.1-SNAPSHOT-shaded.jar .

 <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <!-- SNIP --> </plugin> <plugin> <artifactId>maven-assembly-plugin</artifactId> <!-- SNIP --> </plugin> </plugins> 
0
source share

Akka Docs helped me fix this problem. If you are using Shade, then you must specify a transformer.

  <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>reference.conf</resource> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <manifestEntries> <Main-Class>akka.Main</Main-Class> </manifestEntries> </transformer> 
0
source share

All Articles