Understanding the spark monitoring interface

Question

Understanding the spark monitoring interface

For current Spark operation, this is part of the user interface details for the URL: http: // localhost: 4040 / stages / stage /? Id = 1 & attempt = 0

enter image description here

The document http://spark.apache.org/docs/1.2.0/monitoring.html does not describe each of these parameters. What do the “Input”, “Write Time” and “Shuffle Write” columns mean?

As can be seen from this screenshot, these 4 tasks were completed within 1.3 minutes, and I am trying to detect if there is a bottleneck where this happens.

Spark is configured to use 4 cores, I think that’s why 4 tasks are displayed in the user interface, each task is performed on one core?

What determines the size of "Shuffle Write"?

:

15/02/11 20:55:33 INFO rdd.HadoopRDD: : :/c:/data/example.txt: 103306 + 103306 15/02/11 20:55:33 INFO rdd.HadoopRDD: : :/c:/data/example.txt: 0 + 103306 15/02/11 20:55:33 INFO rdd.HadoopRDD: : :/c:/data/example.txt: 0 + 103306 15/02/11 20:55:33 INFO rdd.HadoopRDD: : :/c:/data/example.txt: 103306 + 103306 15/02/11 20:55:33 INFO rdd.HadoopRDD: : :/c:/data/example.txt: 103306 + 103306 15/02/11 20:55:33 INFO rdd.HadoopRDD: : :/c:/data/example.txt: 0 + 103306 15/02/11 20:55:33 INFO rdd.HadoopRDD: : :/c:/data/example.txt: 0 + 103306 15/02/11 20:55:34 INFO rdd.HadoopRDD: : :/c:/data/example.txt: 103306 + 103306 15/02/11 20:55:34 INFO rdd.HadoopRDD: : :/c:/data/example.txt: 103306 + 103306 .....................

, "enter" 100,9 ( Spark UI) ?

+4

scala apache-spark

blue-sky 11 . '15 21:14

2

- , . , , .

, , . , , , . , , , .

, . , , .

, , .

+6

user3648294 12 . '15 12:19

Sietse · Accepted Answer · 2015-02-11T23:34:31+0000

, - ( ). - , , .

Understanding the spark monitoring interface

More articles: