First, if this is for production code, you might want to do something between the second cudaEventRecord and cudaEventSynchronize (). Otherwise, it may reduce the ability of your application to overlap the GPU and processor.
Next, I separate the creation of events and their destruction from the recording of events. I'm not sure about the price, but overall you can not often refer to cudaEventCreate and cudaEventDestroy.
What I would do is create a class like this
class EventTimer { public: EventTimer() : mStarted(false), mStopped(false) { cudaEventCreate(&mStart); cudaEventCreate(&mStop); } ~EventTimer() { cudaEventDestroy(mStart); cudaEventDestroy(mStop); } void start(cudaStream_t s = 0) { cudaEventRecord(mStart, s); mStarted = true; mStopped = false; } void stop(cudaStream_t s = 0) { assert(mStarted); cudaEventRecord(mStop, s); mStarted = false; mStopped = true; } float elapsed() { assert(mStopped); if (!mStopped) return 0; cudaEventSynchronize(mStop); float elapsed = 0; cudaEventElapsedTime(&elapsed, mStart, mStop); return elapsed; } private: bool mStarted, mStopped; cudaEvent_t mStart, mStop; };
Note. I did not include cudaSetDevice () - it seems to me that this should be left for the code that uses this class to make it more flexible. The user will need to guarantee that the same device will be active when calling start and stop.
PS. Not for NVIDIA for CUTIL, production code should be used - it is used just for convenience in our examples and is not as rigorously tested or optimized as the CUDA libraries and compilers themselves. I recommend that you extract things like cutilSafeCall () into your own libraries and headers.
source share