Initialize PySpark to predefine the SparkContext 'sc' variable

Question

Initialize PySpark to predefine the SparkContext 'sc' variable

When using PySpark, I would like SparkContext to be initialized (in yarn client mode) when creating a new laptop.

The following tutorials describe how to do this in previous versions of ipython / jupyter <4

https://www.dataquest.io/blog/pyspark-installation-guide/

https://npatta01.imtqy.com/2015/07/22/setting_up_pyspark/

I'm not quite sure how to achieve this using a laptop> 4, as stated in http://jupyter.readthedocs.io/en/latest/migrating.html#since-jupyter-does-not-have-profiles-how-do -i-customize-it

I can manually create and configure Sparkcontext, but I do not want our analysts to worry about this.

Does anyone have any ideas?

0

ipython jupyter-notebook jupyter apache-spark pyspark

K2j Apr 19 '17 at 16:02

source share

1 answer

desertnaut · Accepted Answer · 2017-04-20T15:24:18+0000

Well, the missing profile features in Jupyter also puzzled me in the past, although for a different reason - I wanted to be able to switch between different deep learning systems (Theano and TensorFlow) on demand; I eventually found a solution (described in my blog post here ).

The fact is that although there are no profiles in Jupyter, the startup files for the IPython kernel still exist, and since Pyspark uses this particular kernel, it can be used in your case.

, Pyspark Jupyter, , , script init_spark.py :

from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("yarn-client")
sc = SparkContext(conf = conf)

~/.ipython/profile_default/startup/ .

, sc Jupyter:

 In [1]: sc
 Out[1]:<pyspark.context.SparkContext at 0x7fcceb7c5fd0>

 In [2]: sc.version
 Out[2]: u'2.0.0'

Apache Toree ( ), ( , ).

Initialize PySpark to predefine the SparkContext 'sc' variable

More articles: