Pyopencl: difference between to_device and Buffer

Let

import pyopencl as cl import pyopencl.array as cl_array import numpy a = numpy.random.rand(50000).astype(numpy.float32) mf = cl.mem_flags 

What's the difference between

 a_gpu = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a) 

and

 a_gpu = cl_array.to_device(self.ctx, self.queue, a) 

?

And what is the difference between

 result = numpy.empty_like(a) cl.enqueue_copy(self.queue, result, result_gpu) 

and

 result = result_gpu.get() 

?

+8
python numpy opencl pyopencl
source share
2 answers

Buffers are the CL malloc version, and pyopencl.array.Array is the workalike of numpy arrays on a computing device.

So, for the second version of the first part of your question, you can write a_gpu + 2 to get new arrays that have 2 added to each number in your array, whereas in the case of Buffer only PyOpenCL sees a bag with bytes and cannot execute such operation.

The second part of your question in reverse is the same: if you have an array of PyOpenCL, .get() copies the data and converts it to a numpy array (based on the host). Since numpy arrays are one of the most convenient ways to get contiguous memory in Python, the second option with enqueue_copy also ends in a numpy array, but note that you could copy this data into an array of any size (as long as it's big enough) and of any type - the copy is executed as a bag with bytes, while .get() ensures that you get the same size and type on the host.

Bonus fact: there is, of course, a buffer underlying each PyOpenCL array. You can get it from the .data attribute.

+16
source share

To answer the first question, Buffer(hostbuf=...) can be called using everything that implements the buffer interface ( link ). pyopencl.array.to_device(...) must be called using ndarray ( link ). ndarray implements a buffer interface and works anywhere. However, it is expected that only hostbuf=... will work, for example, using bytearray (which also implements the buffer interface). I did not confirm this, but it seems that this is what the documents offer.

In the second question, I'm not sure what type of result_gpu should be when you call get() on it (did you mean Buffer.get_host_array() ?). In any case, enqueue_copy() works between the Buffer combination, Image and host can have offsets and regions and can be asynchronous (with is_blocking=False ), and I think that these options are only available in this way (whereas get() will block and return an integer buffer). ( link )

+3
source share

All Articles