OpenCL calculation freezes screen

As the name says, when I start my OpenCL kernel, the whole screen stops redrawing (the image displayed on the monitor remains unchanged until my program is executed with calculations. This is true even if I turn it off from your notebook and plug it back in - the same image is always displayed), and the computer does not seem to respond to mouse movements - the cursor remains in the same position.

I do not know why this is happening. Maybe this is a bug in my program, or is this standard behavior?

While searching on Google, I found this thread on the AMD forum, and some people suggested it normally, since the GPU cannot refresh the screen when it is busy with calculations.

If this is true, is there a way around this?

My kernel calculation can take up to several minutes, and my computer is practically not used all the time, it really hurts.

EDIT1: this is my current setup:

  • The graphics card is an ATI Mobility Radeon HD 5650 with 512 MB of memory and the latest Catalyst beta driver from AMD.
  • the graphics can be switched - Intel integrated integrated / ATI-card, but I turned off switching in the BIOS, because otherwise I could not get a driver running on Ubuntu.
  • Ubuntu 12.10 operating system (64-bit), but this also happens on Windows 7 (64-bit).
  • I have a monitor connected via HDMI (but the laptop screen freezes too, so this should not be a problem)

EDIT2: so after a day of playing with my code, I took tips from your answers and changed my algorithm to something like this (in pseudocode):

 for (cl_ulong chunk = 0; chunk < num_chunks; chunk += chunk_size) { /* set kernel arguments that are different for each chunk */ clSetKernelArg(/* ... */); /* schedule kernel for next execution */ clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL, NULL); /* read out the results from kernel and append them to output array on host */ clEnqueueReadBuffer(cmd_queue, of_buf, CL_TRUE, 0, chunk_size, output + chunk, 0, NULL, NULL); } 

So now I share the entire workload on the host and send it to the GPU in pieces. For each piece of data, I install a new kernel, and the results obtained from it are added to the output array with the correct offset.

Does this mean that the calculation should be divided?

This seems like a way to fix the freeze problem, and even more so now I can process data that is much larger than the available GPU memory, but I still have to make some good performance estimates to see what a good piece of size is ...

+4
source share
3 answers

Whenever the GPU launches the OpenCL core, it is entirely dedicated to OpenCL. Some modern Nvidia GPUs are an exception, I think, from the GeForce GTX 500 series, which can run multiple cores if these cores do not use all available computing devices.

Your solutions are to split your calculations into several short kernel calls, which is the best solution for all rounds, because it will work even on separate computers with a GPU or invest in a cheap GPU to control the display.

If you intend to run long kernels on your GPUs, you must disable the detection and timeout recovery for GPUs or make the timeout delay longer than the maximum kernel runtime (better since errors can still be caught), see here , how to do it.

+5
source

Every time I had a display freeze, or "The driver display stopped responding and recovered," this was due to an error. It can freeze the whole system, and the only thing I can do is reset. Instead, now I am developing the processor first. It never destroys my system. It’s easier to debug this way, since I can use printf. As soon as I get a code that works without errors on the processor, I will try it on the GPU.

+2
source

I am new to opencl and have encountered a similar problem. I found that a short calculation works fine, but a longer one freezes the mouse cursor. For my problem, Windows leaves a yellow triangle in the tray area and puts a message in the event log that says "the display driver stops responding and is restored." The solution I found is to break the calculation into small parts that take no more than a couple of seconds. They run back to back, but apparently allow the video driver enough to make him happy. If I set global_work_size to a value high enough to maximize throughput, the video response is very slow, but the problem of restarting / reloading the drivers never occurs.

+1
source

All Articles