No no. For example, I had this small kernel for testing atomic add:
kernel void atomicAdd(volatile global int *result){ atomic_add(&result[0], 1); }
Call this host code (pyopencl + unittest):
def test_atomic_add(self): NDRange = (4, 4) result = np.zeros(1, dtype=np.int32) out_buf = cl.Buffer(self.ctx, self.mf.WRITE_ONLY, size=result.nbytes) self.prog.atomicAdd(self.queue, NDRange, NDRange, out_buf) cl.enqueue_copy(self.queue, result, out_buf).wait() self.assertEqual(result, 16)
always returned the correct value when using my processor. However, on the ATI HD 5450, the return value was always undesirable.
And if I remember well, on NVIDIA the first launch returned the correct value, that is 16, but for the next run the values ββwere 32, 48, etc. It was a reuse of the same location with the old value still stored there.
When I fixed the host code with this line (copying the value 0 to the buffer):
out_buf = cl.Buffer(self.ctx, self.mf.WRITE_ONLY | self.mf.COPY_HOST_PTR, hostbuf=result)
Everything works fine on any devices.
CaptainObvious
source share