As the name says, when I start my OpenCL kernel, the whole screen stops redrawing (the image displayed on the monitor remains unchanged until my program is executed with calculations. This is true even if I turn it off from your notebook and plug it back in - the same image is always displayed), and the computer does not seem to respond to mouse movements - the cursor remains in the same position.
I do not know why this is happening. Maybe this is a bug in my program, or is this standard behavior?
While searching on Google, I found this thread on the AMD forum, and some people suggested it normally, since the GPU cannot refresh the screen when it is busy with calculations.
If this is true, is there a way around this?
My kernel calculation can take up to several minutes, and my computer is practically not used all the time, it really hurts.
EDIT1: this is my current setup:
- The graphics card is an ATI Mobility Radeon HD 5650 with 512 MB of memory and the latest Catalyst beta driver from AMD.
- the graphics can be switched - Intel integrated integrated / ATI-card, but I turned off switching in the BIOS, because otherwise I could not get a driver running on Ubuntu.
- Ubuntu 12.10 operating system (64-bit), but this also happens on Windows 7 (64-bit).
- I have a monitor connected via HDMI (but the laptop screen freezes too, so this should not be a problem)
EDIT2: so after a day of playing with my code, I took tips from your answers and changed my algorithm to something like this (in pseudocode):
for (cl_ulong chunk = 0; chunk < num_chunks; chunk += chunk_size) { clSetKernelArg(); clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL, NULL); clEnqueueReadBuffer(cmd_queue, of_buf, CL_TRUE, 0, chunk_size, output + chunk, 0, NULL, NULL); }
So now I share the entire workload on the host and send it to the GPU in pieces. For each piece of data, I install a new kernel, and the results obtained from it are added to the output array with the correct offset.
Does this mean that the calculation should be divided?
This seems like a way to fix the freeze problem, and even more so now I can process data that is much larger than the available GPU memory, but I still have to make some good performance estimates to see what a good piece of size is ...