How to start a spark interactively in cluster mode

Question

How to start a spark interactively in cluster mode

I have a spark cluster running on

spark://host1:7077 spark://host2:7077 spark://host3:7077

and connect via /bin/spark-shell --master spark://host1:7077 When you try to read the file using

 val textFile = sc.textFile("README.md") textFile.count()

The prompt indicates

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

When checking through Web ui on host1:8080 it shows:

 Workers: 0 Cores: 0 Total, 0 Used Memory: 0.0 B Total, 0.0 B Used Applications: 0 Running, 2 Completed Drivers: 0 Running, 0 Completed Status: ALIVE

My question is how to specify the kernel and memory when working in cluster mode with a spark shell? Or do I need to start packing my scala code into a .jar file and then send the job to a spark?

thanks

+5

scala apache-spark

user2829759 Apr 22 '15 at 6:26

source share

1 answer

Sandesh deshmane · Answer 1 · 2015-04-22T07:21:40+0000

Please package your code with jar and use it in your code

  String[] jars = new String[] { sparkJobJar }; sparkConf.setMaster("masterip"); sparkConf.set("spark.executor.memory", sparkWorkerMemory); sparkConf.set("spark.default.parallelism", sparkParallelism); JavaSparkContext ctx = new JavaSparkContext(sparkConf);

Using spark.executor.memory, you can provide working memory, and Parallelism will help with the number of parallel tasks performed in the cluster.

you have the slaves file in .. / spark / conf, you need to put the ips from the slaves here.

please run the wizard on the main node / spark / sbin / start-master.sh

please run slave on the sub-nodes / spark / sbin / start-slaves.sh

How to start a spark interactively in cluster mode

More articles: