Measure application FLOPs with linux perf tool

I want to measure the number of floating point operations and arithmetic operations performed by some application using the "perf" command, a new command line interface command in the Linux performance counter subsystem. (For testing purposes, I use a simple dummy application that I created, see below).

Since I could not find any โ€œperfectionalโ€ events defined for measuring FP and integer operations, I started digging into unprocessed hardware event codes (for use with -rNNN, where NNN is the hexadecimal value of the event code). So my real problem is that the codes that I found for the retired instructions (INST_RETIRED) do not distinguish between FP and other instructions (X87 and MMX / SSE). When I tried to use the appropriate umasks for specific code, I found out that somehow "perf" does not understand or support the inclusion of umask. I tried:

% perf stat -e rC0 ./a.out 

who gives me retired instructions but

 % perf stat -e rC002 ./a.out 

which should give me the executed X87 instructions, says that I set the wrong parameters. Maybe so, but what is the correct way to use umasks raw hardware events with "perf"? in general, what is the way to get the exact number of floating point operations and integer programs executed using the perf tool?

Thanks a lot, Konstantin Boyanov


Here is my test application:

 int main(void){ float numbers[1000]; float res1; double doubles[1000]; double res2; int i,j=3,k=42; for(i=0;i<1000;i++){ numbers[i] = (i+k)*j; doubles[i] = (i+j)*k; res1 = numbers[i]/(float)k; res2 = doubles[i]/(float)j; } } 
+4
source share
2 answers

The event used is processor dependent. You can use libpfm4 (http://perfmon2.git.sourceforge.net/ git / gitweb-index.cgi) to determine which of the available events (using the showevinfo program) and then check_events from the same distribution to find out source codes for the event. My Sandy Bridge processor supports the FP_COMP_OPS_EXE event, which I found in empirical order, closely matches the FLOP score.

+4
source

I'm not sure about perfectionism, but oprofile has floating point events for many processors. There may be some overlap, since INST_RETIRED is also a valid oprofile event.

+2
source

All Articles