Does anyone know how to profile the performance of all java code running in a Hadoop cluster?
I will explain with a simple example. If we create a local java development, we can run Yourkit to measure the% of processor adopted by each method of each class. We see that class A calls method X, and this takes 90% of the execution time of the entire application, and then it captures the code's inefficiency.
But if we execute the mapreduce task and run it in the cluster, I would also like to see what is sluggish: our map / reduction code or the infrastructure itself. Thus, I would like to have a service that receives information about each class / method call and% time to execute it, which collects it somewhere in HDFS, and then analyzes the method that calls the tree with CPU consumption.
Quetion: Does anyone know if such a solution exists?
PS Note. I understand that such a thing will slow down the cluster. And I understand that such a thing should be done either on a test cluster, or in agreement with the client. The question is, "does such a thing exist?" Thank.
source
share