Spark - Random Play

Recently, I tuned the performance of some large, shuffled heavy work. Looking at the spark interface, I noticed the option "Shuffle Read Blocked Time" in the additional metrics section.

This “random read lock time” appears to be over 50% of the task duration for a large number of tasks.

Although I can use some features to do this, I cannot find documentation that explains what it actually represents. Needless to say, I also could not find any resources for mitigation strategies.

Can someone give an idea of ​​how I can reduce the intercept time in random order?

+6
source share
1 answer

“Random drag and drop time” is the time during which the tasks spent on locking were waiting for the shuffle data to be read from the remote computers. The exact metric from which it comes is shuffleReadMetrics.fetchWaitTime.

It is difficult to contribute to a mitigation strategy without actually knowing what data you are trying to read or what remote computers you are reading. However, consider the following:

  • Check the connection to the remote computers from which you are reading data.
  • Check your code / assignments to make sure that you are only reading the data that you definitely need to read in order to finish the job.
  • In some cases, you may consider dividing your work into several tasks that are performed in parallel, provided that they are independent of each other.
  • Perhaps you could upgrade your cluster to have more nodes so that you can split the workload to be more granular and therefore have an overall shorter waiting time.

Regarding metrics, this documentation should shed light on them: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-webui-StagePage.html

Finally, it was also difficult for me to find information about Shuffle Read Blocked Time, but if you put in quotation marks, for example: “Shuffle Read Blocked Time” in a Google search, you will find good results.

+1
source

All Articles