Counting FLOPS / GFLOPS in a program - CUDA

My application, which multiplies the CRS matrix and vector (SpMV), has already been completed, and the only thing that needs to be done now is to calculate the FLOPS of my application. In my opinion, it is very difficult to estimate the number of floating point operations in the case of sparse matrix multiplication - a vector, since the number of multiplications in one row is really "unstable" or free.

I was just trying to measure time with "cudaprof" (available in the directory. / CUDA / bin) - it works fine.

Any sugestions and pasta instructions appreciated!

+1
source share
1 answer

This is not just your opinion; it is a simple fact that the number of operations in the case of a sparse matrix depends on the data, so you cannot get a reasonable answer without knowing about the data. This makes it impossible to evaluate a one-time set of all data.

This is probably one of those situations where you can think a lot about it for many hours (and do a lot of research) to make an accurate assessment, or you could spend several minutes writing a version of your existing one that increments the counter every time when he performs an operation. Of course, it will take quite a while (especially if you do not do this on a CUDA-enabled form), but probably much less time than it takes to do the thinking, and when the answer comes out, you don’t have to work hard to convince yourself that's right.

+2
source

All Articles