CUDA: cudaEvent_t and cudaThreadSynchronize use

I am a bit confused about using cudaEvent_t. I am currently using a call clock()like this to find the duration of a kernel call:

cudaThreadSynchronize();
clock_t begin = clock();

fooKernel<<< x, y >>>( z, w );

cudaThreadSynchronize();
clock_t end = clock();

// Print time difference: ( end - begin )

Looking for a higher resolution timer that I am considering using cudaEvent_t. Do I need to call cudaThreadSynchronize()before recording time with cudaEventRecord()or is it excessive?

The reason I ask is because there is another call cudaEventSynchronize()that seems to wait until the event is recorded. If recording is delayed, will the calculated time difference not show the extra time after the kernel terminates?

+5
source share
1

(cudaStreamSynchronize). , . :

//create events
cudaEvent_t event1, event2;
cudaEventCreate(&event1);
cudaEventCreate(&event2);

//record events around kernel launch
cudaEventRecord(event1, 0); //where 0 is the default stream
kernel<<<grid,block>>>(...); //also using the default stream
cudaEventRecord(event2, 0);

//synchronize
cudaEventSynchronize(event1); //optional
cudaEventSynchronize(event2); //wait for the event to be executed!

//calculate time
float dt_ms;
cudaEventElapsedTime(&dt_ms, event1, event2);

event2, , , . ​​ ( ) event1 kernel .

cudaStreamSynchronize cudaThreadSynchronize, .

+12

All Articles