Show relationships from one to many as 2 columns - 1 unique row (list of identifiers and commas separated)

Question

Show relationships from one to many as 2 columns - 1 unique row (list of identifiers and commas separated)

I need something similar to these two SO questions, but using Informix SQL syntax.

My data is as follows:

id codes 63592 PELL 58640 SUBL 58640 USBL 73571 PELL 73571 USBL 73571 SUBL

I want him to return like this:

 id codes 63592 PELL 58640 SUBL, USBL 73571 PELL, USBL, SUBL

See also group_concat () on Informix .

+4

sql informix group-concat concatenation one-to-many

CheeseConQueso Apr 3 '09 at 19:30

source share

5 answers

I'm not sure about the informix sql file, but in MSSQL or Oracle you can do this with

DECODE or CASE keywords by combining them together. However, this will require that you know all the potential values ahead of time, which is fragile.

I assume you don't like the STUFF keyword because informix doesn't support it?

Oracle also supports the CONNECT BY keywords, which will work, but again, Informix may not be supported.

Probably the best answer would be to create this output at your client / data level after the request. Is there a special reason why this must be done in the request?

+1

Jason coyne Apr 3 '09 at 20:19

source share

In addition, if informix allows you to create custom functions, you can create a function that returns a string with a concatenated value.

+1

Jason coyne Apr 3 '09 at 20:31

source share

Based on the example of Jonathan Leffler and RET's comments on ordering concatenated values using Informix 12.10FC8DE, I came up with the following set of users:

 CREATE FUNCTION mgc_init ( dummy VARCHAR(255) ) RETURNING SET(LVARCHAR(2048) NOT NULL); RETURN SET{}::SET(LVARCHAR(2048) NOT NULL); END FUNCTION; CREATE FUNCTION mgc_iter ( p_result SET(LVARCHAR(2048) NOT NULL) , p_value VARCHAR(255) ) RETURNING SET(LVARCHAR(2048) NOT NULL); IF p_value IS NOT NULL THEN INSERT INTO TABLE(p_result) VALUES (TRIM(p_value)); END IF; RETURN p_result; END FUNCTION; CREATE FUNCTION mgc_comb ( p_partial1 SET(LVARCHAR(2048) NOT NULL) , p_partial2 SET(LVARCHAR(2048) NOT NULL) ) RETURNING SET(LVARCHAR(2048) NOT NULL); INSERT INTO TABLE(p_partial1) SELECT vc1 FROM TABLE(p_partial2)(vc1); RETURN p_partial1; END FUNCTION; CREATE FUNCTION mgc_fini ( p_final SET(LVARCHAR(2048) NOT NULL) ) RETURNING LVARCHAR; DEFINE l_str LVARCHAR(2048); DEFINE l_value LVARCHAR(2048); LET l_str = NULL; FOREACH SELECT vvalue1 INTO l_value FROM TABLE(p_final) AS vt1(vvalue1) ORDER BY vvalue1 IF l_str IS NULL THEN LET l_str = l_value; ELSE LET l_str = l_str || ',' || l_value; END IF; END FOREACH; RETURN l_str; END FUNCTION; GRANT EXECUTE ON mgc_fini TO PUBLIC; CREATE AGGREGATE m_group_concat WITH ( INIT = mgc_init , ITER = mgc_iter , COMBINE = mgc_comb , FINAL = mgc_fini );

Concatenated values will not have duplicates and will be ordered.

I used Informix collections , namely SET , which does not allow duplicate values to try to keep the code somewhat simple.

The method is to use SET to save intermediate results (and eliminate duplicates), and at the end it builds a concatenated string from the ordered values of the final SET .

Using LVARCHAR for SET elements is due to the fact that I originally used VARCHAR , but the memory consumption was very, very high. The documentation hints that inside Informix there may be a drop of VARCHAR to CHAR . I made changes, and this actually reduced memory consumption (but it is still high).

However, this cumulative memory consumption is about 2 orders of magnitude higher than that of Jonathan and about 2 times slower in our tests (using a table of about 300,000 rows).

Therefore use with caution. It consumes a lot of memory and is not subject to extensive verification (maybe a memory leak somewhere).

EDIT 1:

My previous code should leak the memory structure somewhere (or inside Informix it stores collection-based tables and can generate a lot of them).

Thus, while still trying to avoid the need to code an aggregate function in C , here is another alternative using Informix BSON built-in functions that will use much less memory and be a little faster.

 CREATE FUNCTION m2gc_init ( dummy VARCHAR(255) ) RETURNING BSON; RETURN '{"terms":[]}'::JSON::BSON; END FUNCTION; CREATE FUNCTION m2gc_iter ( p_result BSON , p_value VARCHAR(255) ) RETURNING BSON; DEFINE l_add_array_element LVARCHAR(2048); IF p_value IS NOT NULL THEN LET l_add_array_element = '{ $addToSet: { terms: "' || TRIM(p_value) || '" } }'; LET p_result = BSON_UPDATE(p_result, l_add_array_element); END IF; RETURN p_result; END FUNCTION; CREATE FUNCTION m2gc_comb ( p_partial1 BSON , p_partial2 BSON ) RETURNING BSON; DEFINE l_array_elements LVARCHAR(2048); DEFINE l_an_element LVARCHAR(2048); DEFINE l_guard INTEGER; LET l_array_elements = NULL; LET l_guard = BSON_SIZE(p_partial2, 'terms.0'); IF l_guard > 0 THEN WHILE l_guard > 0 LET l_an_element = BSON_VALUE_LVARCHAR(p_partial2, 'terms.0'); IF l_array_elements IS NULL THEN LET l_array_elements = '"' || l_an_element || '"'; ELSE LET l_array_elements = l_array_elements || ', "' || l_an_element || '"'; END IF; LET p_partial2 = BSON_UPDATE(p_partial2, '{ $pop: { terms: -1 } }'); LET l_guard = BSON_SIZE(p_partial2, 'terms.0'); END WHILE; LET l_array_elements = '{ $addToSet: { terms: { $each: [ ' || l_array_elements || ' ] } } }'; LET p_partial1 = BSON_UPDATE(p_partial1, l_array_elements); END IF; RETURN p_partial1; END FUNCTION; CREATE FUNCTION m2gc_fini ( p_final BSON ) RETURNING LVARCHAR; DEFINE l_str_agg LVARCHAR(2048); DEFINE l_an_element LVARCHAR(2048); DEFINE l_iter_int INTEGER; DEFINE l_guard INTEGER; LET l_str_agg = NULL; LET l_guard = BSON_SIZE(p_final, 'terms.0'); IF l_guard > 0 THEN LET p_final = BSON_UPDATE(p_final, '{ $push: { terms: { $each: [], $sort: 1 } } }'); LET l_iter_int = 0; WHILE l_guard > 0 LET l_an_element = BSON_VALUE_LVARCHAR(p_final, 'terms.' || l_iter_int); IF l_str_agg IS NULL THEN LET l_str_agg = TRIM(l_an_element); ELSE LET l_str_agg = l_str_agg || ',' || TRIM(l_an_element); END IF; LET l_iter_int = l_iter_int + 1; LET l_guard = BSON_SIZE(p_final, 'terms.' || l_iter_int); END WHILE; END IF; RETURN l_str_agg; END FUNCTION; CREATE AGGREGATE m2_group_concat WITH ( INIT = m2gc_init , ITER = m2gc_iter , COMBINE = m2gc_comb , FINAL = m2gc_fini ) ;

The aggregated return value will be ordered without duplicates.

Again, this was incorrectly verified. This is just a POC.

One of the problems is that it does not sanitize the input values. Some of the BSON control functions receive parameters that are constructed by concatenating strings, and unescaped characters can violate these parameters. For example, a string value with quotation marks on it: 'I"BrokeIt' ) can trigger a set of errors (included Assert failures).

And I'm sure there are other problems.

However, the memory consumption of this implementation is in the same order as in the Jonathan example, and about 60% slower (again, only very rudimentary testing was performed).

+1

Luís Marques Mar 14 '17 at 17:49

source share

I would like to point you to this answer to another similar question about stack overflow. You are looking for something like the MySQL function group_concat() .

0

Keith gaughan Apr 3 '09 at 20:22

source share

Jonathan leffler · Accepted Answer · 2009-04-04T06:42:44+0000

I believe that for an answer you need a custom aggregate similar to this:

 CREATE FUNCTION gc_init(dummy VARCHAR(255)) RETURNING LVARCHAR; RETURN ''; END FUNCTION; CREATE FUNCTION gc_iter(result LVARCHAR, value VARCHAR(255)) RETURNING LVARCHAR; IF result = '' THEN RETURN TRIM(value); ELSE RETURN result || ',' || TRIM(value); END IF; END FUNCTION; CREATE FUNCTION gc_comb(partial1 LVARCHAR, partial2 LVARCHAR) RETURNING LVARCHAR; IF partial1 IS NULL OR partial1 = '' THEN RETURN partial2; ELIF partial2 IS NULL OR partial2 = '' THEN RETURN partial1; ELSE RETURN partial1 || ',' || partial2; END IF; END FUNCTION; CREATE FUNCTION gc_fini(final LVARCHAR) RETURNING LVARCHAR; RETURN final; END FUNCTION; CREATE AGGREGATE group_concat WITH (INIT = gc_init, ITER = gc_iter, COMBINE = gc_comb, FINAL = gc_fini);

Given a table of elements (called elements) with a column with a name containing (rather funny) the element name and another column named atomic_number, this query produces this result:

 SELECT group_concat(name) FROM elements WHERE atomic_number < 10; Hydrogen,Helium,Lithium,Beryllium,Boron,Carbon,Nitrogen,Oxygen,Fluorine

In relation to the question, you should get the answer you need:

 SELECT id, group_concat(codes) FROM anonymous_table GROUP BY id;

 CREATE TEMP TABLE anonymous_table ( id INTEGER NOT NULL, codes CHAR(4) NOT NULL, PRIMARY KEY (id, codes) ); INSERT INTO anonymous_table VALUES(63592, 'PELL'); INSERT INTO anonymous_table VALUES(58640, 'SUBL'); INSERT INTO anonymous_table VALUES(58640, 'USBL'); INSERT INTO anonymous_table VALUES(73571, 'PELL'); INSERT INTO anonymous_table VALUES(73571, 'USBL'); INSERT INTO anonymous_table VALUES(73571, 'SUBL'); INSERT INTO anonymous_table VALUES(73572, 'USBL'); INSERT INTO anonymous_table VALUES(73572, 'PELL'); INSERT INTO anonymous_table VALUES(73572, 'SUBL'); SELECT id, group_concat(codes) FROM anonymous_table GROUP BY id ORDER BY id;

The way out of this:

 58640 SUBL,USBL 63592 PELL 73571 PELL,SUBL,USBL 73572 PELL,SUBL,USBL

An additional data set has been added to check whether the insert sequence affected the result; it seems that this is not so (codes are sorted in order, I'm not sure if there is a way to change - on the contrary - this order).

Notes:

This aggregate should be used for any type that can be converted to VARCHAR (255), which means any numeric or temporary type. Long CHAR columns and blob types (BYTE, TEXT, BLOB, CLOB) are not processed.
Simple LVARCHAR limits the total size to 2048 bytes. If you think you need a longer length, specify LVARCHAR(10240) (for 10 KiB), for example.
As in Informix 12.10.FC5, the maximum working length is 16380; something longer seems to call SQL -528: Maximum output rowsize (32767) exceeded , which surprises me.

If you need to remove an aggregate, you can use:

 DROP AGGREGATE IF EXISTS group_concat; DROP FUNCTION IF EXISTS gc_fini; DROP FUNCTION IF EXISTS gc_init; DROP FUNCTION IF EXISTS gc_iter; DROP FUNCTION IF EXISTS gc_comb;

Show relationships from one to many as 2 columns - 1 unique row (list of identifiers and commas separated)

More articles: