CL_OUT_OF_RESOURCES for 2 million floats with 1GB VRAM?

It seems that 2 million floats should not matter much, only 8 MB of 1 GB of RAM GPU. I can devote so much time, and sometimes more, than without problems. I get CL_OUT_OF_RESOURCES when I do clEnqueueReadBuffer, which seems weird. Can I smell where the problem started? Shouldn't OpenCL fail in clEnqueueReadBuffer? It must be when I distributed the data correctly? Is there a way to get more details than just an error code? It would be great if I could see how much VRAM was allocated when OpenCL was declared CL_OUT_OF_RESOURCES.

+5
source share
4 answers

Not all available memory may be necessarily included in a single receive request. Read heap fragmentation 1 , 2 , 3 to learn more about why the biggest allocation that can succeed is the largest contiguous block of memory and how the blocks will be split into smaller parts as a result of memory usage.

This is not that the resource has been exhausted ... It simply cannot find a single part large enough to satisfy your request ...

+3
source

I had the same problem as mine (took me all day to fix). I am sure that people with the same problem will stumble on this, so I am posting this old question.

You may not have checked the maximum size of the kernel workgroup. .

Here's how you do it:

size_t kernel_work_group_size; clGetKernelWorkGroupInfo(kernel, device, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &kernel_work_group_size, NULL); 

My devices (2x NVIDIA GTX 460 and Intel i7 CPU) support a maximum workgroup size of 1024, but the code above returns around 500 when I transfer my path tracking kernel. When I used workgroup size 1024, it obviously failed and gave me the CL_OUT_OF_RESOURCES error.

The more complex your kernel, the smaller the maximum workgroup size for it will become (or at least what I experienced).

Edit:
I just realized that you said "clEnqueueReadBuffer" instead of "clEnqueueNDRangeKernel" ...
My answer was related to clEnqueueNDRangeKernel.
I'm sorry for the mistake.
I hope this is still useful for other people.

+7
source

From another source :

- the call to clFinish () gets the error status for calculation (and not getting it when trying to read data).
- the error "out of resources" can also be caused by a timeout of 5 s if the card (NVidia) is also used as a display
- It can also appear when you have pointer errors in your kernel.

Subsequent observation suggests starting the kernel first on the CPU to ensure that you are not accessing the memory back.

+5
source

Beyond the limits of acceptable phenomena, the kernel is usually silent (since there is still no error when calling the kernel queue).

However, if you try to read the kernel result later using clEnqueueReadBuffer (). This error will appear. This indicates that something went wrong during kernel execution.

Check the kernel code to read / write outside.

+1
source

All Articles