Getting More Information from Rprof ()

I was trying to understand what time-hogs are in some kind of R-code that I wrote, so I use Rprof . The result is not very useful so far:

 > summaryRprof() $by.self self.time self.pct total.time total.pct "$<-.data.frame" 2.38 23.2 2.38 23.2 "FUN" 2.04 19.9 10.20 99.6 "[.data.frame" 1.74 17.0 5.54 54.1 "[.factor" 1.42 13.9 2.90 28.3 ... 

Is there a way to go deeper and find out what specific calls are $<-.data.frame and FUN (probably from by() ), etc. are really the culprits? Or will I need to reorganize the code and make smaller functional fragments to get finer results?

The only reason I resist refactoring is that I have to pass data structures to functions, and all passing through by value, so this seems like a step in the wrong direction.

Thanks.

+6
profiling r
source share
3 answers

The existing CRAN package profr and proftools are useful for this. The latter can use Rgraphviz, which is not always installed.

The profiling R Wiki page contains additional information and a nice script from Romain that can also render (but requires graphviz).

+4
source share

Rprof takes call stack samples at time intervals - good news.

What I would do is access the original stackshots that it collects and select a few randomly and examine them. I am looking for call sites (not just functions, but places where one function calls another) that appear on several patterns. For example, if the call site is displayed on 50% of the samples, then this is what it costs, because its possible removal will save about 50% of the total time. (It seems obvious, right? But this is not well known.)

Not every expensive call site is optimized, but some of them if the program was not as fast as possible.

(Donโ€™t be distracted by questions such as the number of samples that you need to look at. If something saves you a reasonable fraction of the time, it appears on a similar fraction of the samples. The exact number doesnโ€™t matter what you find. Also, don't be distracted on graph, recursion, time, and counting calculations. For each call site, it is important that the proportion of stack samples display it.)

+2
source share

The analysis of the output generated by Rprof is not too complicated, and then you get access to everything.

0
source share

All Articles