How to find the source URL of a sparkmeter on Amazon EMR

I am new to sparks and trying to install spark on an Amazon cluster with version 1.3.1. when i do

SparkConf sparkConfig = new SparkConf().setAppName("SparkSQLTest").setMaster("local[2]"); 

it works for me, however I found out that this is for testing purpose. I can set local [2]

When I tried to use cluster mode, I changed it to

 SparkConf sparkConfig = new SparkConf().setAppName("SparkSQLTest").setMaster("spark://localhost:7077"); 

with this I get below error

I tried to contact an unreachable remote address [akka.tcp: // sparkMaster @localhost: 7077]. The address is now blocked for 5000 ms, all messages to this address will be delivered in dead letters. Reason: connection refused 15/06/10 15:22:21 INFO client.AppClient $ ClientActor: connection to the master akka.tcp: // sparkMaster @localhost: 7077 / user / master ..

Can someone please let me set the main url.

+8
apache-spark amazon-emr spark-streaming
source share
1 answer

If you use the bootstrap action from https://github.com/awslabs/emr-bootstrap-actions/tree/master/spark , the setting is configured for Spark on YARN. So just install the wizard on yarn-client or yarn-cluster . Be sure to determine the number of artists with memory and cores. Read more about Spark on YARN at https://spark.apache.org/docs/latest/running-on-yarn.html

Addition to artist settings for memory and kernel size:

Take a look at the default YARN node manager configuration for each type at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration_H2.html , specifically yarn.scheduler.maximum-allocation-mb . You can determine the number of cores from the main EC2 URL ( http://aws.amazon.com/ec2/instance-types/ ). The maximum size of the artist’s memory must correspond to the maximum distribution less than the cost of Spark and in increments of 256 MB. A good description of this calculation is http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ . Remember that just over half of the artist’s memory can be used for the RDD cache.

+7
source share

All Articles