I installed CUDA runtime and version 7.0 drivers on my workstation (Ubuntu 14.04, 2xIntel XEON e5 + 4x Tesla k20m). I used the following program to check if my installation works:
#include <stdio.h> __global__ void helloFromGPU() { printf("Hello World from GPU!\n"); } int main(int argc, char **argv) { printf("Hello World from CPU!\n"); helloFromGPU<<<1, 1>>>(); printf("Hello World from CPU! Again!\n"); cudaDeviceSynchronize(); printf("Hello World from CPU! Yet again!\n"); return 0; }
I get the correct output, but it took enourmus time:
$ nvcc hello.cu -O2 $ time ./hello > /dev/null real 0m8.897s user 0m0.004s sys 0m1.017s`
If I delete all the device code, the total execution will take 0.001 s. So why does my simple program almost take 10 seconds?
chris source share