I use Spark 1.6.0 on three virtual machines, 1x Master (stand-alone), 2x working with 8G RAM, 2 processors each.
I am using the kernel configuration below:
{ "display_name": "PySpark ", "language": "python3", "argv": [ "/usr/bin/python3", "-m", "IPython.kernel", "-f", "{connection_file}" ], "env": { "SPARK_HOME": "<mypath>/spark-1.6.0", "PYTHONSTARTUP": "<mypath>/spark-1.6.0/python/pyspark/shell.py", "PYSPARK_SUBMIT_ARGS": "--master spark://<mymaster>:7077 --conf spark.executor.memory=2G pyspark-shell --driver-class-path /opt/vertica/java/lib/vertica-jdbc.jar" } }
This is currently working. I can use the sc and sqlContext spark context without importing, as in the pyspark shell.
The problem occurs when I use several laptops: On my spark wizard, I see two pyspark-shell applications that make sense, but only one can work at a time. But here, “launching” does not mean doing anything, even if I do not run anything on the laptop, it will appear as “work”. Given this, I can’t share my resources between laptops, which is pretty sad (currently I need to kill the first shell (= the kernel for the laptop) to start the second).
If you have ideas on how to do this, tell me! Also, I'm not sure the way I work with kernels is “best practice,” I already had problems installing spark and jupyter to work together.
thanks all
source share