I'm just starting to play with OpenCL, and I'm stuck on how to structure the program quite efficiently (mainly by avoiding a lot of data transfer to / from the GPU or wherever I work)
What I'm trying to do is given:
v = r*i + b*j + g*k
. I know v for different values ββof r , g and b , but i , j and k unknown. I want to calculate reasonable values ββfor i / j / k using brute force
In other words, I have a bunch of βrawβ RGB pixel values, and I have a desaturated version of these colors. I do not know that the weights (i / j / k) used are calculating unsaturated values.
My initial plan was as follows:
load data into CL buffer (therefore input r / g / b values ββand output)
have a kernel that takes three possible matrix values ββand various pixel data buffers.
Then it executes v = r*i + b*j + g*k and subtracts the value of v into a known value and saves it in the "grade" buffer
The other core calculates the RMS error for this value (if the difference is zero for all input values, the values ββfor i / j / k are "correct")
I have this work (written using Python and PyCL, the code is here ), but I'm wondering how I can parallelize this piece more work (with a few attempts to multiply the i / j / k values)
I give out, I have 4 read-only buffers (3 for input values, 1 for expected values), but I need a separate evaluation buffer for each i / j / k combination
Another problem is that RMS calculation is the slowest part because it is efficiently single-threaded (sums all the values ββin "score" and sqrt () total)
Basically, I wonder if there is a reasonable way to structure such a program.
This seems like a task well suited to OpenCL β hope the description of my goal was not too confusing! As already mentioned, my current code is here , and in case it is clearer, this is the version of Python that I am trying to do:
import sys import math import random def make_test_data(w = 128, h = 128): in_r, in_g, in_b = [], [], [] print "Make raw data" for x in range(w): for y in range(h): in_r.append(random.random()) in_g.append(random.random()) in_b.append(random.random()) # the unknown values mtx = [random.random(), random.random(), random.random()] print "Secret numbers were: %s" % mtx out_r = [(r*mtx[0] + g*mtx[1] + b*mtx[2]) for (r, g, b) in zip(in_r, in_g, in_b)] return {'in_r': in_r, 'in_g': in_g, 'in_b': in_b, 'expected_r': out_r} def score_matrix(ir, ig, ib, expected_r, mtx): ms = 0 for i in range(len(ir)): val = ir[i] * mtx[0] + ig[i] * mtx[1] + ib[i] * mtx[2] ms += abs(val - expected_r[i]) ** 2 rms = math.sqrt(ms / float(len(ir))) return rms # Make random test data test_data = make_test_data(16, 16) lowest_rms = sys.maxint closest = [] divisions = 10 for possible_r in range(divisions): for possible_g in range(divisions): for possible_b in range(divisions): pr, pg, pb = [x / float(divisions-1) for x in (possible_r, possible_g, possible_b)] rms = score_matrix( test_data['in_r'], test_data['in_g'], test_data['in_b'], test_data['expected_r'], mtx = [pr, pg, pb]) if rms < lowest_rms: closest = [pr, pg, pb] lowest_rms = rms print closest