This is how I think the function vector is computed:
You have 3 x 3 = 9 rectangles.
Each pixel represents essentially 3 numbers, 1 for each of the channels of red, green, and blue.
For each rectangle, you calculate the average of the red, green, and blue colors for all the pixels in that rectangle. This gives you 3 numbers for each rectangle.
In total, you have 9 (rectangles) x 3 (average for R, G, B) = 27 numbers.
Just combine these 27 numbers into one 27 on 1 (often written as 27 x 1) vector. These are 27 numbers grouped together. This 27-digit vector is a sign of the vector X, which represents the color statistics of your photo. In the code, if you use C ++, it will probably be an array of 27 numbers, or perhaps even an instance of the class (exactly named). You can think of this vector of functions as some form of "summary" of what color looks like in a photograph. Roughly speaking, everything looks like this: [R1, G1, B1, R2, G2, B2, ..., R9, G9, B9], where R1 is the middle / middle red pixel in the first rectangle, etc.
I believe that step 2 involves some form of comparison of these feature vectors, so that those who have similar feature vectors (and therefore a similar color) will be put together. The comparison is likely to be related to the use of the Euclidean distance (see here ) or some other indicator to compare how similar the feature vectors (and therefore the color of the photographs) are to each other.
Finally, as Anoni-Mousse suggested, it would be preferable to convert your pixels from RGB to HSB / HSV color. If you use OpenCV or have access to it, this is just one liner code. Otherwise wiki HSV etc. Will give you a mathematical formula to perform the conversion.
Hope this helps.