Where is the Spark user interface in Google Dataproc?

Which port should I use to access the Spark interface in Google Dataproc?

I tried port 4040 and 7077, as well as a bunch of other ports that I found using netstat -pln

The firewall is configured correctly.

+8
apache-spark google-cloud-dataproc
source share
2 answers

Dataproc launches Spark on top of YARN, so you won’t find typical Spark standalone ports; instead, when you start the Spark job, you can visit port 8088 , which will show you the main page of the YARN ResourceManager. Any running Spark jobs will be available through the Application Master link on this page. The Spark Application Master home page looks just like the regular Spark-standalone landing page, which you usually find on port 8080 for the default Spark settings.

Since workers are registered on the internal network, YARN links will use the internal cluster node names (host names must include the Dataproc cluster name as a prefix), but this means that if you access from the external network, the links may not work first; you need to replace the host name with an external IP address if you are using a firewall based approach.

A simpler experience would be to use the SOCKS proxy approach, as described here: https://cloud.google.com/dataproc/cluster-web-interfaces

In this case, just using gcloud compute ssh to start the socks local lightweight proxy and then open the browser you pointed to will allow you to click all the YARN links as usual.

+15
source share

When following the instructions in Denis' answer , I found that I could not connect to ports 8080 or 8088 for dataproc image v1.0.

The open ports on the main node suggested using 18080, which I did after the documentation for port 18080 and voilΓ‘: Access to webui.

-one
source share

All Articles