How to write combinatorics function in postgres?

Question

How to write combinatorics function in postgres?

I have a PostgreSQL table of this form:

base_id int | mods smallint[] 3 | {7,15,48}

I need to fill out a table of this form:

 combo_id int | base_id int | mods smallint[] 1 | 3 | 2 | 3 | {7} 3 | 3 | {7,15} 4 | 3 | {7,48} 5 | 3 | {7,15,48} 6 | 3 | {15} 7 | 3 | {15,48} 8 | 3 | {48}

I think I could accomplish this using a function that does almost this, iterating over the first table and writing combinations to the second table: Generate all combinations in SQL

But, I'm a Postgres newbie and have not been able to figure out for my whole life how to do this using plpgsql. It does not have to be particularly fast; It will be launched periodically only on the server. The first table contains approximately 80 records, and a rough calculation suggests that we can expect about 2600 records for the second table.

Can anyone at least point me in the right direction?

Edit: Craig: I have PostgreSQL 9.0. I have successfully used UNNEST ():

 FOR messvar IN SELECT * FROM UNNEST(mods) AS mod WHERE mod BETWEEN 0 AND POWER(2, @n) - 1 LOOP RAISE NOTICE '%', messvar; END LOOP;

but then did not know where to go next.

Edit:. For reference, I ended up using the Erwin solution, adding one line to add a null result ('{}') to each set, and a special case of Erwin refers to deletion:

 CREATE OR REPLACE FUNCTION f_combos(_arr integer[], _a integer[] DEFAULT '{}'::integer[], _z integer[] DEFAULT '{}'::integer[]) RETURNS SETOF integer[] LANGUAGE plpgsql AS $BODY$ DECLARE i int; j int; _up int; BEGIN IF array_length(_arr,1) > 0 THEN _up := array_upper(_arr, 1); IF _a = '{}' AND _z = '{}' THEN RETURN QUERY SELECT '{}'::int[]; END IF; FOR i IN array_lower(_arr, 1) .. _up LOOP FOR j IN i .. _up LOOP CASE ji WHEN 0,1 THEN RETURN NEXT _a || _arr[i:j] || _z; ELSE RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z; RETURN QUERY SELECT * FROM f_combos(_arr[i+1:j-1], _a || _arr[i], _arr[j] || _z); END CASE; END LOOP; END LOOP; ELSE RETURN NEXT _arr; END IF; END; $BODY$

Then I used this function to populate my table:

 INSERT INTO e_ecosystem_modified (ide_ecosystem, modifiers) (SELECT ide_ecosystem, f_combos(modifiers) AS modifiers FROM e_ecosystem WHERE ecosystemgroup <> 'modifier' ORDER BY ide_ecosystem, modifiers);

Of the 79 rows in the source table with a maximum of 7 elements in the modifiers array, the query took 250 ms to fill out 2630 rows in my output table. Fantastic.

+6

plpgsql postgresql combinatorics

Kim Aug 16 '12 at 23:50

source share

2 answers

One approach is recursive CTE. Erwin updated the recursive function much faster and scales better, however, so it's really useful as an interesting other approach. The updated version of Erwin is more practical.

I tried a bit-counting approach (see end), but without a quick way to snatch arbitrary elements from an array, which turned out to be slower than a recursive one.

Function of recursive CTE functions

 CREATE OR REPLACE FUNCTION combinations(anyarray) RETURNS SETOF anyarray AS $$ WITH RECURSIVE items AS ( SELECT row_number() OVER (ORDER BY item) AS rownum, item FROM (SELECT unnest($1) AS item) unnested ), q AS ( SELECT 1 AS i, $1[1:0] arr UNION ALL SELECT (i+1), CASE x WHEN 1 THEN array_append(q.arr,(SELECT item FROM items WHERE rownum = i)) ELSE q.arr END FROM generate_series(0,1) x CROSS JOIN q WHERE i <= array_upper($1,1) ) SELECT q.arr AS mods FROM q WHERE i = array_upper($1,1)+1; $$ LANGUAGE 'sql';

This is a polymorphic function, so it will work with arrays of any type.

The logic is to iterate over each item in an undefined input set using a worksheet. Start with an empty array in the worksheet with generation number 1. For each entry in the input set, insert two new arrays into the worksheet with the added generation number. One of the two is a copy of the input array from the previous generation, and the other is the input array with the number (generation number) from the incoming set added to it. When the generation number exceeds the number of elements in the input set, return the last generation.

Using

You can use the combinations(smallint[]) function to get the desired results, using it as a return set function in combination with the row_number window row_number .

 -- assuming table structure regress=# \d comb Table "public.comb" Column | Type | Modifiers ---------+------------+----------- base_id | integer | mods | smallint[] | SELECT base_id, row_number() OVER (ORDER BY mod) AS mod_id, mod FROM (SELECT base_id, combinations(mods) AS mod FROM comb WHERE base_id = 3) x ORDER BY mod;

results

 regress=# SELECT base_id, row_number() OVER (ORDER BY mod) AS mod_id, mod regress-# FROM (SELECT base_id, combinations(mods) AS mod FROM comb WHERE base_id = 3) x regress-# ORDER BY mod; base_id | mod_id | mod ---------+--------+----------- 3 | 1 | {} 3 | 2 | {7} 3 | 3 | {7,15} 3 | 4 | {7,15,48} 3 | 5 | {7,48} 3 | 6 | {15} 3 | 7 | {15,48} 3 | 8 | {48} (8 rows) Time: 2.121 ms

Zero arrays of elements produce a null result. If you want combinations({}) return a single string {} , then UNION ALL is executed with {} .

Theory

It seems you want k-combinations for all k in k-multi-combinations , not simple combinations. See the number of combinations with repetition .

In other words, you want all k-combinations of elements from your set, for all k from 0 to n, where n is the given size.

A related SO: SQL question is to find all possible combination that has a really interesting answer about bit counting.

Bit operations exist in Pg, so a bit counting method should be possible. You expect it to be more efficient, but since it so slowly selects a scattered subset of elements from an array, it actually works slower.

 CREATE OR REPLACE FUNCTION bitwise_subarray(arr anyarray, elements integer) RETURNS anyarray AS $$ SELECT array_agg($1[n+1]) FROM generate_series(0,array_upper($1,1)-1) n WHERE ($2>>n) & 1 = 1; $$ LANGUAGE sql; COMMENT ON FUNCTION bitwise_subarray(anyarray,integer) IS 'Return the elements from $1 where the corresponding bit in $2 is set'; CREATE OR REPLACE FUNCTION comb_bits(anyarray) RETURNS SETOF anyarray AS $$ SELECT bitwise_subarray($1, x) FROM generate_series(0,pow(2,array_upper($1,1))::integer-1) x; $$ LANGUAGE 'sql';

If you could find a faster way to write bitwise_subarray , then comb_bits would be very fast. For example, a small extension function of C, but I'm crazy enough to write one of them for an SO answer .

+3

Craig ringer Aug 17 '12 at 2:14

source share

Erwin brandstetter · Accepted Answer · 2012-08-17T01:16:59+0000

After I slept, I had a completely new, simpler and faster idea:

 CREATE OR REPLACE FUNCTION f_combos(_arr anyarray) RETURNS TABLE (combo anyarray) LANGUAGE plpgsql AS $BODY$ BEGIN IF array_upper(_arr, 1) IS NULL THEN combo := _arr; RETURN NEXT; RETURN; END IF; CASE array_upper(_arr, 1) -- WHEN 0 THEN -- does not exist WHEN 1 THEN RETURN QUERY VALUES ('{}'), (_arr); WHEN 2 THEN RETURN QUERY VALUES ('{}'), (_arr[1:1]), (_arr), (_arr[2:2]); ELSE RETURN QUERY WITH x AS ( SELECT f.combo FROM f_combos(_arr[1:array_upper(_arr, 1)-1]) f ) SELECT x.combo FROM x UNION ALL SELECT x.combo || _arr[array_upper(_arr, 1)] FROM x; END CASE; END $BODY$;

Call:

 SELECT * FROM f_combos('{1,2,3,4,5,6,7,8,9}'::int[]) ORDER BY 1;

512 lines, total run time: 2.899 ms

To explain

Note the special cases with NULL and an empty array.
Assembly layout for a primitive array of two.
Any longer array is divided into:
- combinations for one array of length n-1
- plus all those combined with element n .. recursively.

Really simple once you got it.

Works for 1-dimensional integer arrays, starting at index 1 (see below).
2-3 times faster than the old solution, it scales better.
Works for any type of element again (using polymorphic types).
Includes an empty array in the result that appears in the question (and as @Craig pointed me out in the comments).
Shorter, more elegant.

This assumes array indices starting at 1 (the default). If you are unsure of your values, call a function like this to normalize:

 SELECT * FROM f_combos(_arr[array_lower(_arr, 1):array_upper(_arr, 1)]);

Not sure if there is a more elegant way to normalize array indices. I asked a question about this:
Normalize array indices for a 1-dimensional array so that they start at 1

Old solution (slower)

 CREATE OR REPLACE FUNCTION f_combos2(_arr int[], _a int[] = '{}', _z int[] = '{}') RETURNS SETOF int[] LANGUAGE plpgsql AS $BODY$ DECLARE i int; j int; _up int; BEGIN IF array_length(_arr,1) > 0 THEN _up := array_upper(_arr, 1); FOR i IN array_lower(_arr, 1) .. _up LOOP FOR j IN i .. _up LOOP CASE ji WHEN 0,1 THEN RETURN NEXT _a || _arr[i:j] || _z; WHEN 2 THEN RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z; RETURN NEXT _a || _arr[i:j] || _z; ELSE RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z; RETURN QUERY SELECT * FROM f_combos2(_arr[i+1:j-1], _a || _arr[i], _arr[j] || _z); END CASE; END LOOP; END LOOP; ELSE RETURN NEXT _arr; END IF; END; $BODY$;

Call:

 SELECT * FROM f_combos2('{7,15,48}'::int[]) ORDER BY 1;

Works for one-dimensional integer arrays. This can be further optimized, but it is certainly not required for the scope of this issue.
ORDER BY to override the order displayed in the question.

Provide for NULL or an empty array, since NULL is mentioned in the comments.

Tested with PostgreSQL 9.1, but should work with any intermediate version. array_lower() and array_upper() existed since at least PostgreSQL 7.4. Only the default parameter values are new in version 8.4. It can be easily replaced.

The performance is decent.

 SELECT DISTINCT * FROM f_combos('{1,2,3,4,5,6,7,8,9}'::int[]) ORDER BY 1;

511 lines, total execution time: 7.729 ms

Explanation

It is based on this simple form, which creates only all combinations of neighboring elements:

 CREATE FUNCTION f_combos(_arr int[]) RETURNS SETOF int[] LANGUAGE plpgsql AS $BODY$ DECLARE i int; j int; _up int; BEGIN _up := array_upper(_arr, 1); FOR i in array_lower(_arr, 1) .. _up LOOP FOR j in i .. _up LOOP RETURN NEXT _arr[i:j]; END LOOP; END LOOP; END; $BODY$;

But this fails for sub-arrays with more than two elements. So:

For any submatrix with 3 elements, one array is added with only two external elements. it is a shortcut for this special occasion that improves performance and is not strictly necessary.
For any submatrix with more than three elements, I take the outer two elements and fill in all the combinations of internal elements constructed using the same function recursively .

How to write combinatorics function in postgres?

To explain

Old solution (slower)

Explanation

Function of recursive CTE functions

Using

results

Theory

More articles: