Why it does not work
The index type (i.e. the operator class) gin_trgm_ops based on the % operator, which works with two text arguments:
CREATE OPERATOR trgm.%( PROCEDURE = trgm.similarity_op, LEFTARG = text, RIGHTARG = text, COMMUTATOR = %, RESTRICT = contsel, JOIN = contjoinsel);
You cannot use gin_trgm_ops for arrays. The index defined for an array column will never work with any(array[...]) , because the individual elements of the arrays are not indexed. To index the array, you need another type of index, namely the index of the gin array.
Fortunately, the gin_trgm_ops index was so thoughtful that it works with like and ilike , which can be used as an alternative solution (an example is described below).
Test table
It has two columns (id serial primary key, names text[]) and contains 100,000 Latin sentences, divided into array elements.
select count(*), sum(cardinality(names))::int words from test; count | words
A search for the word praesent gives 7051 lines in 2400 ms:
explain analyse select count(*) from test where 'praesent' % any(names); QUERY PLAN
Materialized view
One solution is to normalize the model associated with creating a new table with one name in one row. Such a restructuring can be difficult to implement, and sometimes impossible due to existing queries, views, functions, or other dependencies. A similar effect can be achieved without changing the structure of the table using a materialized view.
create materialized view test_names as select id, name, name_id from test cross join unnest(names) with ordinality u(name, name_id) with data;
With ordinality not required, but it can be useful when aggregating names in the same order as in the main table. The test_names gives the same results as the main table, at the same time.
After creating the runtime, the index decreases several times:
create index on test_names using gin (name gin_trgm_ops); explain analyse select count(distinct id) from test_names where 'praesent' % name QUERY PLAN
The solution has several disadvantages. Because the view is materialized, the data is stored twice in the database. Remember to update the view after changes to the main table. And queries can be more complex due to the need to join the view to the main table.
Using ilike
We can use ilike on arrays represented as text. We need an immutable function to create the index in the array as a whole:
create function text(text[]) returns text language sql immutable as $$ select $1::text $$ create index on test using gin (text(names) gin_trgm_ops);
and use the function in the queries:
explain analyse select count(*) from test where text(names) ilike '%praesent%' QUERY PLAN
60 vs 2400 ms, a pretty nice result without the need to create additional relationships.
This solution seems simpler and requires less work, provided that ilike , which is a less accurate tool than the trgm % operator, is sufficient.
Why should we use ilike instead of % for whole arrays as text? The similarity largely depends on the length of the texts. It is very difficult to choose a suitable limit for searching a word in long texts of various lengths. For example. with limit = 0.3 we have the results:
with data(txt) as ( values ('praesentium,distinctio,modi,nulla,commodi,tempore'), ('praesentium,distinctio,modi,nulla,commodi'), ('praesentium,distinctio,modi,nulla'), ('praesentium,distinctio,modi'), ('praesentium,distinctio'), ('praesentium') ) select length(txt), similarity('praesent', txt), 'praesent' % txt "matched?" from data; length | similarity | matched?