How to run Spark-java program from the command line

I run the wordcount java program in sparks. How to run it from the command line.

+6
source share
2 answers

Take the wordcount example from say: https://github.com/holdenk/fastdataprocessingwithsparkexamples/tree/master/src/main/scala/pandaspark/examples . Follow these steps to create a fat file:

mkdir example-java-build/; cd example-java-build mvn archetype:generate \ -DarchetypeGroupId=org.apache.maven.archetypes \ -DgroupId=spark.examples \ -DartifactId=JavaWordCount \ -Dfilter=org.apache.maven.archetypes:maven-archetype-quickstart cp ../examples/src/main/java/spark/examples/JavaWordCount.java JavaWordCount/src/main/java/spark/examples/JavaWordCount.java 

You add the corresponding dependencies of the spark core and spark examples. Make sure you have dependencies based on your spark version. I use spark 1.1.0 and therefore I have corresponding dependencies. My pom.xml looks like this:

  <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-examples_2.10</artifactId> <version>1.1.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.1.0</version> </dependency> </dependencies> 

Create your jar file using mvn.

 cd example-java-build/JavaWordCount mvn package 

This creates your fat file in the target directory. Copy the jar file to any location on the server. Go to the folder with your sparks. (in my case: /root/spark-1.1.0-bin-hadoop2.4/bin )

Submit a lawsuit: My work looks like this:

 ./spark-submit --class "spark.examples.JavaWordCount" --master yarn://myserver1:8032 /root/JavaWordCount-1.0-SNAPSHOT.jar hdfs://myserver1:8020/user/root/hackrfoe.txt 

Here is the class: entry point for your application (for example, org.apache.spark.examples.SparkPi) --master: the main URL of the cluster (for example spark: //23.195.26.187: 7077) The last argument is any text file of your choice for the program.

The output should look like this to count the number of words in a text file.

 in: 17 sleeping.: 1 sojourns: 1 What: 4 protect: 1 largest: 1 other: 1 public: 1 worst: 1 hackers: 12 detected: 1 from: 4 and,: 1 secretly: 1 breaking: 1 football: 1 answer.: 1 attempting: 2 "hacker: 3 

Hope this helps!

+3
source

First you need to create your Java program as a standalone application using Maven (following the example here ), and then submit your application using spark-submit.

0
source

All Articles