I have a database table where each row is a color. My goal: to set the input color, calculate its distance to each color in the database table and sort the results by this distance. Or, as a user story: when I select a color, I want to see a list of colors that are most similar to the one I selected, with the closest matches at the top of the list.
I understand that various Delta E formulas (CIE Lab) are the best choice for this . I could not find any native SQL versions of the formulas, so I wrote my own SQL versions of the Delta E CIE 1976 and Delta E CIE 2000 . I checked the accuracy of my SQL versions of formulas against the results generated by python-colormath .
Formula 1976 is easy to write in SQL or any other language, because it is a simple calculation of Euclidean distance. It works well and quickly for me, on data sets of any size (I checked it on a color table with 100,000 rows, and the query takes less than 1 second).
Formula 2000, by contrast, is very long and complex. I managed to implement it in SQL, but its performance is small: about 5 seconds for querying 10,000 rows and about 1 minute for querying 100,000 rows.
I wrote an example application called colorsearchtest (in Python / Flask / Postgres) to play with my implementations (and I set up a demo on Heroku ). If you try this application, you can clearly see the difference in performance between Delta E. 1976 and 2000 queries.
This is a diagram for a color table (for each color, it stores the corresponding RGB and Lab representations as three numerical values):
CREATE TABLE color ( id integer NOT NULL, rgb_r integer, rgb_g integer, rgb_b integer, lab_l double precision, lab_a double precision, lab_b double precision );
This is some data in the table (all random colors created using a script in my application):
INSERT INTO color (id, rgb_r, rgb_g, rgb_b, lab_l, lab_a, lab_b) VALUES (902, 164, 214, 189, 81.6521019943304793, -21.2561872439361323, 7.08354581694699004); INSERT INTO color (id, rgb_r, rgb_g, rgb_b, lab_l, lab_a, lab_b) VALUES (903, 113, 229, 64, 81.7930860963098212, -60.5865728472875205, 66.4022741184551819); INSERT INTO color (id, rgb_r, rgb_g, rgb_b, lab_l, lab_a, lab_b) VALUES (904, 65, 86, 78, 34.6593864327796624, -9.95482220634028003, 2.02661293272071719); ...
And this is the Delta E CIE 2000 SQL function that I use:
CREATE OR REPLACE FUNCTION DELTA_E_CIE2000(double precision, double precision, double precision, double precision, double precision, double precision, double precision, double precision, double precision) RETURNS double precision AS $$ WITH c AS (SELECT (CAST($1 AS VARCHAR) || ',' || CAST($2 AS VARCHAR) || ',' || CAST($3 AS VARCHAR) || ',' || CAST($4 AS VARCHAR) || ',' || CAST($5 AS VARCHAR) || ',' || CAST($6 AS VARCHAR)) AS lab_pair_str, (($1 + $4) / 2.0) AS avg_lp, SQRT( POW($2, 2.0) + POW($3, 2.0)) AS c1, SQRT( POW(($5), 2.0) + POW(($6), 2.0)) AS c2), gs AS (SELECT c.lab_pair_str, (0.5 * (1.0 - SQRT( POW(((c.c1 + c.c2) / 2.0), 7.0) / ( POW(((c.c1 + c.c2) / 2.0), 7.0) + POW(25.0, 7.0))))) AS g FROM c WHERE c.lab_pair_str = ( CAST($1 AS VARCHAR) || ',' || CAST($2 AS VARCHAR) || ',' || CAST($3 AS VARCHAR) || ',' || CAST($4 AS VARCHAR) || ',' || CAST($5 AS VARCHAR) || ',' || CAST($6 AS VARCHAR))), ap AS (SELECT gs.lab_pair_str, ((1.0 + gs.g) * $2) AS a1p, ((1.0 + gs.g) * $5) AS a2p FROM gs WHERE gs.lab_pair_str = ( CAST($1 AS VARCHAR) || ',' || CAST($2 AS VARCHAR) || ',' || CAST($3 AS VARCHAR) || ',' || CAST($4 AS VARCHAR) || ',' || CAST($5 AS VARCHAR) || ',' || CAST($6 AS VARCHAR))), cphp AS (SELECT ap.lab_pair_str, SQRT( POW(ap.a1p, 2.0) + POW($3, 2.0)) AS c1p, SQRT( POW(ap.a2p, 2.0) + POW($6, 2.0)) AS c2p, ( DEGREES(ATAN2($3, ap.a1p)) + ( CASE WHEN DEGREES(ATAN2($3, ap.a1p)) < 0.0 THEN 360.0 ELSE 0.0 END)) AS h1p, ( DEGREES(ATAN2($6, ap.a2p)) + ( CASE WHEN DEGREES(ATAN2($6, ap.a2p)) < 0.0 THEN 360.0 ELSE 0.0 END)) AS h2p FROM ap WHERE ap.lab_pair_str = ( CAST($1 AS VARCHAR) || ',' || CAST($2 AS VARCHAR) || ',' || CAST($3 AS VARCHAR) || ',' || CAST($4 AS VARCHAR) || ',' || CAST($5 AS VARCHAR) || ',' || CAST($6 AS VARCHAR))), av AS (SELECT cphp.lab_pair_str, ((cphp.c1p + cphp.c2p) / 2.0) AS avg_c1p_c2p, (((CASE WHEN (ABS(cphp.h1p - cphp.h2p) > 180.0) THEN 360.0 ELSE 0.0 END) + cphp.h1p + cphp.h2p) / 2.0) AS avg_hp FROM cphp WHERE cphp.lab_pair_str = ( CAST($1 AS VARCHAR) || ',' || CAST($2 AS VARCHAR) || ',' || CAST($3 AS VARCHAR) || ',' || CAST($4 AS VARCHAR) || ',' || CAST($5 AS VARCHAR) || ',' || CAST($6 AS VARCHAR))), ts AS (SELECT av.lab_pair_str, (1.0 - 0.17 * COS(RADIANS(av.avg_hp - 30.0)) + 0.24 * COS(RADIANS(2.0 * av.avg_hp)) + 0.32 * COS(RADIANS(3.0 * av.avg_hp + 6.0)) - 0.2 * COS(RADIANS(4.0 * av.avg_hp - 63.0))) AS t, (( (cphp.h2p - cphp.h1p) + (CASE WHEN (ABS(cphp.h2p - cphp.h1p) > 180.0) THEN 360.0 ELSE 0.0 END)) - (CASE WHEN (cphp.h2p > cphp.h1p) THEN 720.0 ELSE 0.0 END)) AS delta_hlp FROM av INNER JOIN cphp ON av.lab_pair_str = cphp.lab_pair_str WHERE av.lab_pair_str = ( CAST($1 AS VARCHAR) || ',' || CAST($2 AS VARCHAR) || ',' || CAST($3 AS VARCHAR) || ',' || CAST($4 AS VARCHAR) || ',' || CAST($5 AS VARCHAR) || ',' || CAST($6 AS VARCHAR))), d AS (SELECT ts.lab_pair_str, ($4 - $1) AS delta_lp, (cphp.c2p - cphp.c1p) AS delta_cp, (2.0 * ( SQRT(cphp.c2p * cphp.c1p) * SIN(RADIANS(ts.delta_hlp) / 2.0))) AS delta_hp, (1.0 + ( (0.015 * POW(c.avg_lp - 50.0, 2.0)) / SQRT(20.0 + POW(c.avg_lp - 50.0, 2.0)))) AS s_l, (1.0 + 0.045 * av.avg_c1p_c2p) AS s_c, (1.0 + 0.015 * av.avg_c1p_c2p * ts.t) AS s_h, (30.0 * EXP(-(POW(((av.avg_hp - 275.0) / 25.0), 2.0)))) AS delta_ro, SQRT( (POW(av.avg_c1p_c2p, 7.0)) / (POW(av.avg_c1p_c2p, 7.0) + POW(25.0, 7.0))) AS r_c FROM ts INNER JOIN cphp ON ts.lab_pair_str = cphp.lab_pair_str INNER JOIN c ON ts.lab_pair_str = c.lab_pair_str INNER JOIN av ON ts.lab_pair_str = av.lab_pair_str WHERE ts.lab_pair_str = ( CAST($1 AS VARCHAR) || ',' || CAST($2 AS VARCHAR) || ',' || CAST($3 AS VARCHAR) || ',' || CAST($4 AS VARCHAR) || ',' || CAST($5 AS VARCHAR) || ',' || CAST($6 AS VARCHAR))), r AS (SELECT d.lab_pair_str, (-2.0 * d.r_c * SIN(2.0 * RADIANS(d.delta_ro))) AS r_t FROM d WHERE d.lab_pair_str = ( CAST($1 AS VARCHAR) || ',' || CAST($2 AS VARCHAR) || ',' || CAST($3 AS VARCHAR) || ',' || CAST($4 AS VARCHAR) || ',' || CAST($5 AS VARCHAR) || ',' || CAST($6 AS VARCHAR))) SELECT SQRT( POW(d.delta_lp / (d.s_l * $7), 2.0) + POW(d.delta_cp / (d.s_c * $8), 2.0) + POW(d.delta_hp / (d.s_h * $9), 2.0) + r.r_t * (d.delta_cp / (d.s_c * $8)) * (d.delta_hp / (d.s_h * $9))) AS delta_e_cie2000 FROM r INNER JOIN d ON r.lab_pair_str = d.lab_pair_str WHERE r.lab_pair_str = ( CAST($1 AS VARCHAR) || ',' || CAST($2 AS VARCHAR) || ',' || CAST($3 AS VARCHAR) || ',' || CAST($4 AS VARCHAR) || ',' || CAST($5 AS VARCHAR) || ',' || CAST($6 AS VARCHAR)) $$ LANGUAGE SQL IMMUTABLE RETURNS NULL ON NULL INPUT;
(I originally wrote this function using nested subqueries at about 10 levels, but then rewrote it instead of using WITH statements, that is, Postgres CTE. The new version is much readable, and the performance is similar to the old version. You can see both versions in the code .)
After defining the function, I use it in the following query:
SELECT c.rgb_r, c.rgb_g, c.rgb_b, DELTA_E_CIE2000(73.9206633504, -50.2996953437, 23.8259166281, c.lab_l, c.lab_a, c.lab_b, 1.0, 1.0, 1.0) AS de2000 FROM color c ORDER BY de2000 LIMIT 100;
So my question is: is there a way to improve the performance of the DELTA_E_CIE2000 function DELTA_E_CIE2000 that it can be used in real time for non-trivial data sets? Or, given the complexity of the formula, is that as fast as it is going to get?
From the testing that I did in my demo application, I would say that for the example of using a simple search for “similar colors” on a website, the difference in the accuracy of the results between the functions of 1976 and 2000 is. virtually negligible. That is, I am already sure that for my needs the 1976 formula is "good enough." However, the 2000 function returns slightly better results (depending on where the input color lies in the laboratory space), and indeed, I'm just wondering if it can accelerate its further progress.