Rolling your own very robust profiler is not that hard. Paste into main ():
int main() { profileCpuUsage(1);
Where:
Now, admittedly, in your simple example, using this CpuUsage () profile is of little use. And this has the disadvantage of requiring you to manually anchor your code by calling the profileCpuUsage () function in appropriate places.
But the benefits include:
- You can use any piece of code, not just procedures.
- Add and remove quickly as you perform a binary search to find and / or remove code hotspots.
- It focuses only on the code you are interested in.
- Portable!
- KISS
One difficult intolerable thing is to define the realElapsedTime () function so that it provides enough detail to get valid times. This usually works for me (using the Windows API under CYGWIN):
#include <windows.h> double realElapsedTime(void) // <-- granularity about 50 microsec on test machines { static LARGE_INTEGER freq, start; LARGE_INTEGER count; if (!QueryPerformanceCounter(&count)) assert(0 && "QueryPerformanceCounter"); if (!freq.QuadPart) { // one time initialization if (!QueryPerformanceFrequency(&freq)) assert(0 && "QueryPerformanceFrequency"); start = count; } return (double)(count.QuadPart - start.QuadPart) / freq.QuadPart; }
For direct Unix, there is one thing in common:
double realElapsedTime(void)
realElapsedTime () gives the wall clock time, not the processing time, which is usually what I want.
There are also other less portable methods to achieve finer granularity using RDTSC; see, for example, http://en.wikipedia.org/wiki/Time_Stamp_Counter and its links, but I have not tried them.
Edit: ravenspoint is a very nice answer, it seems not too different from mine. And his answer uses beautiful descriptive strings, not just ugly numbers, which I often disappointed. But this can only be fixed with a dozen extra lines (but it almost doubles the number of lines!).
Note that we want to avoid using malloc (), and I even doubt strcmp () a bit. Thus, the number of slices never increases. And hash collisions are simply flagged, but rather eliminated: a person’s profiler can fix this by manually typing the number of sections from 30 or changing the description. Not verified
static unsigned gethash(const char *str) // "djb2", for example { unsigned c, hash = 5381; while ((c = *str++)) hash = ((hash << 5) + hash) + c; // hash * 33 + c return hash; } void profileCpuUsage(const char *description) { static struct { int iterations; double elapsedTime; char description[20]; // added! } slices[30]; if (!description) { // print stats, but using description, mostly unchanged... } else { const int slice = gethash(description) % NUMBER(slices); if (!slices[slice].description[0]) { // if new slice assert(strlen(description) < sizeof slices[slice].description); strcpy(slices[slice].description, description); } else if (!!strcmp(slices[slice].description, description)) { strcpy(slices[slice].description, "!!hash conflict!!"); } // remainder unchanged... } }
And another point is that usually you want to disable this profiling for release versions; this also applies to ravenspoint answer. This can be done using the trick of using an evil macro to determine it:
#define profileCpuUsage(foo)
If this is done, you will of course need to add parentheses to the definition to disable the disconnect macro:
void (profileCpuUsage)(const char *description)...