Why does my Hello world program take almost 10 seconds?

Question

Why does my Hello world program take almost 10 seconds?

I installed CUDA runtime and version 7.0 drivers on my workstation (Ubuntu 14.04, 2xIntel XEON e5 + 4x Tesla k20m). I used the following program to check if my installation works:

#include <stdio.h> __global__ void helloFromGPU() { printf("Hello World from GPU!\n"); } int main(int argc, char **argv) { printf("Hello World from CPU!\n"); helloFromGPU<<<1, 1>>>(); printf("Hello World from CPU! Again!\n"); cudaDeviceSynchronize(); printf("Hello World from CPU! Yet again!\n"); return 0; }

I get the correct output, but it took enourmus time:

 $ nvcc hello.cu -O2 $ time ./hello > /dev/null real 0m8.897s user 0m0.004s sys 0m1.017s`

If I delete all the device code, the total execution will take 0.001 s. So why does my simple program almost take 10 seconds?

+5

c ++ c cuda

chris Jul 01 '15 at 12:04

source share

1 answer

talonmies · Accepted Answer · 2015-07-01T13:03:52+0000

The obvious slow execution time of your example is due to the basic fixed cost of setting up the GPU context.

Since you are running a platform that supports unified addressing, the CUDA runtime needs to map 64 GB of RAM and 4 x 5120 MB from your GPUs to a single virtual address space and register this with the Linux kernel.

This requires many kernel API calls, and it is not fast. I would suggest that this is the main source of the slow work you are observing. You should consider this as a fixed initial cost that must be amortized over the life of your application. In real applications, a 10-second run is trivial and doesn't really matter. In the welcome example of the world, this is not so.

Why does my Hello world program take almost 10 seconds?

More articles: