Configure dynamic allocation in Apache Spark?

I follow the instructions here to configure dynamic allocation for the YARN resource manager.

However, step 3 bothers me: Add this jar to the classpath of all NodeManagers in your cluster.

Does this mean switching to each node server and adding the path to the shuffle.jar to PATH environment variable? export=$PATH:<loc-to-shuffle.jar> ?

+6
source share
1 answer

A direct path means that in all node managers either set the yarn.application.classpath parameter in the yarn-site.xml file, which contains a list of CLASSPATH entries separated by commas.

When this value is empty, the following standard CLASSPATHs for YARN applications will be used.
For Linux: $ HADOOP_CONF_DIR, $ HADOOP_COMMON_HOME / share / hadoop / common /, $ HADOOP_COMMON_HOME / share / hadoop / common / lib /, $ HADOOP_HDFS_HOME / share / hadoop / hdfs /, $ HADOOP_HDFS_OPFSdopfhome $ HADOOP_YARN_HOME / share / hadoop / yarn /, $ HADOOP_YARN_HOME / share / hadoop / yarn / lib /
For Windows:% HADOOP_CONF_DIR%,% HADOOP_COMMON_HOME% / share / hadoop / common /,% HADOOP_COMMON_HOME% / share / hadoop / common / lib /,% HADOOP_HDFS_HOME% / share / hadoop / hdfs /,% HADOOP_HD hdfs / lib /,% HADOOP_YARN_HOME% / share / hadoop / yarn /,% HADOOP_YARN_HOME% / share / hadoop / yarn / lib /

So, put spark-<version>-yarn-shuffle.jar in one of the listed directories of the class path defined in the yarn.application.classpath file or in the default class directories.

You can also create a soft link spark-<version>-yarn-shuffle.jar in one of the yarn path directories

Hope this helps ...

+3
source

All Articles