Can gprof be used to profile a multithreaded program that uses pthreads? That is, will its output include the time used in all threads?
Yes, this is possible using the workaround described here .
Did you consider pstack ? It works great with multiple threads, and it is good for finding performance issues with the stackshot method. gprof is what it is, but most likely you can do better.