How to quickly disable a 2d array in a 1d array in PostgreSQL?

I have a very large array that I computed with Apache Madlib, and I would like to apply the operation to every single array in this 2d array.

I found code that can help me disconnect it from this related answer . However, this code is very slow on this really large 2d array (floating point arrays of 150,000 + 1d). While unnest() is only running a few seconds, even after waiting a few minutes, the code has not completed.

Surely there should be a faster way to disconnect a large 2d array into smaller 1d arrays? Bonus point if this solution uses Apache Madlib. I found one entry similar to the documentation called deconstruct_2d_array , however, when I try to call this function in the matrix, it fails with the following error:

ERROR: Function "deconstruct_2d_array (double precision [])": Invalid conversion type. An internal composite type has more elements than a backend composite type.

+1
performance arrays sql postgresql madlib
source share
2 answers

The function you found in my old answer does not scale well for large arrays. I never thought about arrays of your size, which probably should be a set (table).

Be that as it may, this plpgsql function replaces one in the response . Requires Postgres 9.1 or later.

 CREATE OR REPLACE FUNCTION unnest_2d_1d(ANYARRAY, OUT a ANYARRAY) RETURNS SETOF ANYARRAY AS $func$ BEGIN FOREACH a SLICE 1 IN ARRAY $1 LOOP RETURN NEXT; END LOOP; END $func$ LANGUAGE plpgsql IMMUTABLE STRICT; 

40 times faster in my test on a large 2d array in Postgres 9.6.

STRICT to avoid an exception for entering NULL (as commented out by IamIC ):

ERROR: FOREACH expression must not be null

+2
source share

Now there is a built-in MADlib function for this - array_unnest_2d_to_1d, which was introduced in version 1.11: http://madlib.incubator.apache.org/docs/latest/array__ops_8sql__in.html#af057b589f2a2cb1095caa99feaeb3d70

Here is a usage example:

 CREATE TABLE test1 (pid int, points double precision[]); INSERT INTO test1 VALUES (100, '{{1.0, 2.0, 3.0}, {4.0, 5.0, 6.0}, {7.0, 8.0, 9.0}}'), (101, '{{11.0, 12.0, 13.0}, {14.0, 15.0, 16.0}, {17.0, 18.0, 19.0}}'), (102, '{{21.0, 22.0, 23.0}, {24.0, 25.0, 26.0}, {27.0, 28.0, 29.0}}'); SELECT * FROM test1; 

produces

  pid | points -----+------------------------------------ 100 | {{1,2,3},{4,5,6},{7,8,9}} 101 | {{11,12,13},{14,15,16},{17,18,19}} 102 | {{21,22,23},{24,25,26},{27,28,29}} (3 rows) 

Then call the unsest function:

 SELECT pid, (madlib.array_unnest_2d_to_1d(points)).* FROM test1 ORDER BY pid, unnest_row_id; 

produces

 pid | unnest_row_id | unnest_result -----+---------------+--------------- 100 | 1 | {1,2,3} 100 | 2 | {4,5,6} 100 | 3 | {7,8,9} 101 | 1 | {11,12,13} 101 | 2 | {14,15,16} 101 | 3 | {17,18,19} 102 | 1 | {21,22,23} 102 | 2 | {24,25,26} 102 | 3 | {27,28,29} (9 rows) 

where unnest_row_id is the index into a 2D array

0
source share

All Articles