Launch Spark + Scala + Jupyter on Dataproc

I have not yet managed to get Spark, Scala and Jupyter to collaborate. Does anyone have a simple recipe? What version of each component did you use?

+4
source share
2 answers

Apache Toree is compatible with the ImageProc 1.0 image, which currently includes Spark 1.6.1. I tried unsuccessfully to use it with a preview image that includes a Spark 2.0 preview. To install Toree in the master DataProc file, you can run

sudo apt install python3-pip
pip3 install --user jupyter
export SPARK_HOME=/usr/lib/spark
pip3 install --pre --user toree
export PATH=$HOME/.local/bin:$PATH
jupyter toree install --user --spark_home=$SPARK_HOME
+4
source

Sparks are included in standard Dataproc clusters.

gcloud, Dataproc ( "dplab" ), Jupyter 8124:

$ gcloud dataproc clusters create dplab \
 --initialization-actions \
     gs://dataproc-initialization-actions/jupyter/jupyter.sh \
 --metadata "JUPYTER_PORT=8124" \
 --zone=us-central1-c

:

$ gcloud compute ssh dplab-m \
 --ssh-flag="-Llocalhost:8124:localhost:8124" --zone=us-central1-c

localhost: 8124 , Jupyter.

0

All Articles