CUDA Newbie - Simple var step not working

Question

CUDA Newbie - Simple var step not working

I am working on a project with CUDA. To verify this, I have the following code.

#include <iostream> using namespace std; __global__ void inc(int *foo) { ++(*foo); } int main() { int count = 0, *cuda_count; cudaMalloc((void**)&cuda_count, sizeof(int)); cudaMemcpy(cuda_count, &count, sizeof(int), cudaMemcpyHostToDevice); cout << "count: " << count << '\n'; inc <<< 100, 25 >>> (&count); cudaMemcpy(&count, cuda_count, sizeof(int), cudaMemcpyDeviceToHost); cudaFree(cuda_count); cout << "count: " << count << '\n'; return 0; }

Exit

 count: 0 count: 0

What is the problem?

Thanks in advance!

+6

c ++ cuda

Renato rodrigues Dec 10 '10 at 12:12

source share

3 answers

You must pass cuda_count to your kernel function. In addition, all your threads are trying to increase the same memory location. The effect of this is undefined (at least one record will be successful, but more than one).

You need to prevent this if only one thread does the work:

 __global__ void inc(int *foo) { if (blockIdx.x == 0 && threadIdx.x == 0) ++*foo; }

(unverified)

+8

Konrad Rudolph Dec 10 '10 at 12:35

source share

The problem with your code is that you are passing the device kernel pointer to the count pointer. Not a pointer to counting. One '&' too much

This line

 inc <<< 100, 25 >>> (&count);

Must be

 inc <<< 100, 25 >>> (count);

0

Przemyslaw zych Sep 29 '12 at 7:30

source share

Renato rodrigues · Accepted Answer · 2010-12-10T21:24:45+0000

I have found a solution. I just needed to use an atomic function, that is, a function that runs without interference from other threads. In other words, no other thread can access a specific address until the operation is complete.

the code:

 #include <iostream> using namespace std; __global__ void inc(int *foo) { atomicAdd(foo, 1); } int main() { int count = 0, *cuda_count; cudaMalloc((void**)&cuda_count, sizeof(int)); cudaMemcpy(cuda_count, &count, sizeof(int), cudaMemcpyHostToDevice); cout << "count: " << count << '\n'; inc <<< 100, 25 >>> (cuda_count); cudaMemcpy(&count, cuda_count, sizeof(int), cudaMemcpyDeviceToHost); cudaFree(cuda_count); cout << "count: " << count << '\n'; return 0; }

Output:

 count: 0 count: 2500

Thank you for making me understand the mistake I made.

CUDA Newbie - Simple var step not working

More articles: