Based on the example of Jonathan Leffler and RET's comments on ordering concatenated values โโusing Informix 12.10FC8DE, I came up with the following set of users:
CREATE FUNCTION mgc_init ( dummy VARCHAR(255) ) RETURNING SET(LVARCHAR(2048) NOT NULL); RETURN SET{}::SET(LVARCHAR(2048) NOT NULL); END FUNCTION; CREATE FUNCTION mgc_iter ( p_result SET(LVARCHAR(2048) NOT NULL) , p_value VARCHAR(255) ) RETURNING SET(LVARCHAR(2048) NOT NULL); IF p_value IS NOT NULL THEN INSERT INTO TABLE(p_result) VALUES (TRIM(p_value)); END IF; RETURN p_result; END FUNCTION; CREATE FUNCTION mgc_comb ( p_partial1 SET(LVARCHAR(2048) NOT NULL) , p_partial2 SET(LVARCHAR(2048) NOT NULL) ) RETURNING SET(LVARCHAR(2048) NOT NULL); INSERT INTO TABLE(p_partial1) SELECT vc1 FROM TABLE(p_partial2)(vc1); RETURN p_partial1; END FUNCTION; CREATE FUNCTION mgc_fini ( p_final SET(LVARCHAR(2048) NOT NULL) ) RETURNING LVARCHAR; DEFINE l_str LVARCHAR(2048); DEFINE l_value LVARCHAR(2048); LET l_str = NULL; FOREACH SELECT vvalue1 INTO l_value FROM TABLE(p_final) AS vt1(vvalue1) ORDER BY vvalue1 IF l_str IS NULL THEN LET l_str = l_value; ELSE LET l_str = l_str || ',' || l_value; END IF; END FOREACH; RETURN l_str; END FUNCTION; GRANT EXECUTE ON mgc_fini TO PUBLIC; CREATE AGGREGATE m_group_concat WITH ( INIT = mgc_init , ITER = mgc_iter , COMBINE = mgc_comb , FINAL = mgc_fini );
Concatenated values โโwill not have duplicates and will be ordered.
I used Informix collections , namely SET , which does not allow duplicate values โโto try to keep the code somewhat simple.
The method is to use SET to save intermediate results (and eliminate duplicates), and at the end it builds a concatenated string from the ordered values โโof the final SET .
Using LVARCHAR for SET elements is due to the fact that I originally used VARCHAR , but the memory consumption was very, very high. The documentation hints that inside Informix there may be a drop of VARCHAR to CHAR . I made changes, and this actually reduced memory consumption (but it is still high).
However, this cumulative memory consumption is about 2 orders of magnitude higher than that of Jonathan and about 2 times slower in our tests (using a table of about 300,000 rows).
Therefore use with caution. It consumes a lot of memory and is not subject to extensive verification (maybe a memory leak somewhere).
EDIT 1:
My previous code should leak the memory structure somewhere (or inside Informix it stores collection-based tables and can generate a lot of them).
Thus, while still trying to avoid the need to code an aggregate function in C , here is another alternative using Informix BSON built-in functions that will use much less memory and be a little faster.
CREATE FUNCTION m2gc_init ( dummy VARCHAR(255) ) RETURNING BSON; RETURN '{"terms":[]}'::JSON::BSON; END FUNCTION; CREATE FUNCTION m2gc_iter ( p_result BSON , p_value VARCHAR(255) ) RETURNING BSON; DEFINE l_add_array_element LVARCHAR(2048); IF p_value IS NOT NULL THEN LET l_add_array_element = '{ $addToSet: { terms: "' || TRIM(p_value) || '" } }'; LET p_result = BSON_UPDATE(p_result, l_add_array_element); END IF; RETURN p_result; END FUNCTION; CREATE FUNCTION m2gc_comb ( p_partial1 BSON , p_partial2 BSON ) RETURNING BSON; DEFINE l_array_elements LVARCHAR(2048); DEFINE l_an_element LVARCHAR(2048); DEFINE l_guard INTEGER; LET l_array_elements = NULL; LET l_guard = BSON_SIZE(p_partial2, 'terms.0'); IF l_guard > 0 THEN WHILE l_guard > 0 LET l_an_element = BSON_VALUE_LVARCHAR(p_partial2, 'terms.0'); IF l_array_elements IS NULL THEN LET l_array_elements = '"' || l_an_element || '"'; ELSE LET l_array_elements = l_array_elements || ', "' || l_an_element || '"'; END IF; LET p_partial2 = BSON_UPDATE(p_partial2, '{ $pop: { terms: -1 } }'); LET l_guard = BSON_SIZE(p_partial2, 'terms.0'); END WHILE; LET l_array_elements = '{ $addToSet: { terms: { $each: [ ' || l_array_elements || ' ] } } }'; LET p_partial1 = BSON_UPDATE(p_partial1, l_array_elements); END IF; RETURN p_partial1; END FUNCTION; CREATE FUNCTION m2gc_fini ( p_final BSON ) RETURNING LVARCHAR; DEFINE l_str_agg LVARCHAR(2048); DEFINE l_an_element LVARCHAR(2048); DEFINE l_iter_int INTEGER; DEFINE l_guard INTEGER; LET l_str_agg = NULL; LET l_guard = BSON_SIZE(p_final, 'terms.0'); IF l_guard > 0 THEN LET p_final = BSON_UPDATE(p_final, '{ $push: { terms: { $each: [], $sort: 1 } } }'); LET l_iter_int = 0; WHILE l_guard > 0 LET l_an_element = BSON_VALUE_LVARCHAR(p_final, 'terms.' || l_iter_int); IF l_str_agg IS NULL THEN LET l_str_agg = TRIM(l_an_element); ELSE LET l_str_agg = l_str_agg || ',' || TRIM(l_an_element); END IF; LET l_iter_int = l_iter_int + 1; LET l_guard = BSON_SIZE(p_final, 'terms.' || l_iter_int); END WHILE; END IF; RETURN l_str_agg; END FUNCTION; CREATE AGGREGATE m2_group_concat WITH ( INIT = m2gc_init , ITER = m2gc_iter , COMBINE = m2gc_comb , FINAL = m2gc_fini ) ;
The aggregated return value will be ordered without duplicates.
Again, this was incorrectly verified. This is just a POC.
One of the problems is that it does not sanitize the input values. Some of the BSON control functions receive parameters that are constructed by concatenating strings, and unescaped characters can violate these parameters. For example, a string value with quotation marks on it: 'I"BrokeIt' ) can trigger a set of errors (included Assert failures).
And I'm sure there are other problems.
However, the memory consumption of this implementation is in the same order as in the Jonathan example, and about 60% slower (again, only very rudimentary testing was performed).