How to profiling Hadoop cluster performance

Does anyone know how to profile the performance of all java code running in a Hadoop cluster?

I will explain with a simple example. If we create a local java development, we can run Yourkit to measure the% of processor adopted by each method of each class. We see that class A calls method X, and this takes 90% of the execution time of the entire application, and then it captures the code's inefficiency.

But if we execute the mapreduce task and run it in the cluster, I would also like to see what is sluggish: our map / reduction code or the infrastructure itself. Thus, I would like to have a service that receives information about each class / method call and% time to execute it, which collects it somewhere in HDFS, and then analyzes the method that calls the tree with CPU consumption.

Quetion: Does anyone know if such a solution exists?

PS Note. I understand that such a thing will slow down the cluster. And I understand that such a thing should be done either on a test cluster, or in agreement with the client. The question is, "does such a thing exist?" Thank.

+4
source share
3 answers

. http://ihorbobak.com/index.php/2015/08/05/cluster-profiling/ , .

, :

  • jar (mod of StatsD JVM Profiler) javaagent, JVM, .
  • "javaagent" - , , JVM. Profiler javaagent JVM 100 , NoSQL InfluxDB (https://influxdb.com).
  • stacktraces, / Flame Graph.

http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html. Brendan, , : https://www.youtube.com/watch?v=nZfNehCzGdw. "System Performance: Enterprise and The Cloud", .

+4

, , , .

. - , . clusternodes , Spark, Hadoop -JVM-, perf, , Ihor, FlameGraphs .

- (https://github.com/cerndb/Hadoop-Profiler), , .

Hadoop:

  • Hadoop.
  • HProfiler API YARN . .
  • SSH , , .
  • , SSH , . Java ( perf-map-agent) [] Java.
  • , . , " node". , , - .

, .

https://db-blog.web.cern.ch/blog/joeri-hermans/2016-04-hadoop-performance-troubleshooting-stack-tracing-introduction

, !

+1

According to the documentation, you can configure the parameter in the driver class with JobConf.setProfileEnabled(boolean)which is disabled by default. Hope this gives you a starting point.

And a good blog about problems in the process and a tool . For your information only and not endorsement

0
source

All Articles