ClEnqueueNDRangeKernel launches CL_INVALID_MEM_OBJECT (-38)

I use C ++ binding for OpenCL and when installing one of my kernels I get cl::Error , which says -38 (CL_INVALID_MEM_OBJECT) for clEnqueueNDRangeKernel .

This error is not listed as one of the possible clEnqueueNDRangeKernel errors. The notify function gives me the following output:

CL_INVALID_MEM_OBJECT error executing CL_COMMAND_NDRANGE_KERNEL on a GeForce GTX 560 (device 0).

I have yet to find a minimal example demonstrating this behavior.

What can cause such an error when calling this function?

Using google I just found this answer . It claims that I need to re setKernelArg attached memory object if it has been updated. (At least this is my interpretation, and there is no detailed explanation of what the updated tools are.) However, I doubt that this is correct, although I cannot prove it. Maybe you know the official source?

Update

After some testing, I found that adding the __global const float* parameter to the kernel introduced an error. I also found that an error only occurs every time if I clSetKernelArg this new argument after another (already existing) argument. If I do this before another argument is set, it will work every second time. Of course, this is not an option, since I need to set the argument at any time.

Update 2

I noticed that going through code with debugging "re-introduces" an error in the version where I set a new argument before another. (This means the error is repeated every time.)

Could this be some kind of race condition? I do not use multithreading, but there are 7 threads in the debugger that can come from Qt or OpenCL.

Minimal working example

 #include <CL/cl.hpp> #include <vector> #include <iostream> #define STRINGIFY(x) #x std::string kernel = STRINGIFY( __kernel void apply(__global const float *param1) { } ); template <class T> cl::Buffer genBuffer(const cl::Context &context, const std::vector<T> &data, cl_mem_flags flags = CL_MEM_READ_ONLY) { return cl::Buffer(context, flags | CL_MEM_COPY_HOST_PTR, data.size() * sizeof(data[0]), const_cast<T*>(&data[0])); } int main() { std::vector<cl::Platform> clPlatforms; cl::Platform::get(&clPlatforms); cl_context_properties props[] = { CL_CONTEXT_PLATFORM, (cl_context_properties)clPlatforms[0](), 0}; cl::Context clContext = cl::Context(CL_DEVICE_TYPE_GPU, props); std::vector<cl::Device> devices = clContext.getInfo<CL_CONTEXT_DEVICES>(); if(devices.empty()) { std::cerr << "No devices found!\n"; exit(-1); } cl::Device clDevice = devices[0]; cl::CommandQueue clQueue = cl::CommandQueue(clContext, clDevice, 0, 0); cl::Program program(clContext, cl::Program::Sources(1, std::make_pair(kernel.c_str(), kernel.size()))); program.build(devices); cl::Kernel kernel(program, "apply"); //this introduces the error kernel.setArg(0, genBuffer(clContext, std::vector<cl_float>(100)); //the error is triggered here clQueue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(100), cl::NullRange); } 
+4
source share
1 answer

the problem was that I bound the buffer to the kernel, assuming the kernel would keep the buffer. Then I destroyed all references of cl::Buffer / Memory objects that made the OpenCL implementation remove the buffer.


After running my program through valgrind, I noticed that opencl.so is the available memory of the object previously freed in the cl::~Buffer routine. Reading on clSetKernelArg I noticed:

Users cannot rely on a kernel object to save the objects specified as argument values โ€‹โ€‹for the kernel.

The non-deterministic behavior is clearly the result of the driver accessing a free memory area, thereby entering the UB ground.

Fixed MWE

 #include <CL/cl.hpp> #include <vector> #include <iostream> #define STRINGIFY(x) #x std::string kernel = STRINGIFY( __kernel void apply(__global const float *param1) { } ); template <class T> cl::Buffer genBuffer(const cl::Context &context, const std::vector<T> &data, cl_mem_flags flags = CL_MEM_READ_ONLY) { return cl::Buffer(context, flags | CL_MEM_COPY_HOST_PTR, data.size() * sizeof(data[0]), const_cast<T*>(&data[0])); } int main() { std::vector<cl::Platform> clPlatforms; cl::Platform::get(&clPlatforms); cl_context_properties props[] = { CL_CONTEXT_PLATFORM, (cl_context_properties)clPlatforms[0](), 0}; cl::Context clContext = cl::Context(CL_DEVICE_TYPE_GPU, props); std::vector<cl::Device> devices = clContext.getInfo<CL_CONTEXT_DEVICES>(); if(devices.empty()) { std::cerr << "No devices found!\n"; exit(-1); } cl::Device clDevice = devices[0]; cl::CommandQueue clQueue = cl::CommandQueue(clContext, clDevice, 0, 0); cl::Program program(clContext, cl::Program::Sources(1, std::make_pair(kernel.c_str(), kernel.size()))); program.build(devices); cl::Kernel kernel(program, "apply"); //this version triggers the error //kernel.setArg(0, genBuffer(clContext, std::vector<cl_float>(100)); //This is how it is done correctly cl::Buffer buffer = genBuffer(clContext, std::vector<cl_float>(100)); kernel.setArg(0, buffer); clQueue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(100), cl::NullRange); } 
+2
source

All Articles