Indexing jsonb data to match patterns

This is a continuation:
Matching pattern for jsonb key / value

I have a table as follows

CREATE TABLE "PreStage".transaction ( transaction_id serial NOT NULL, transaction jsonb CONSTRAINT pk_transaction PRIMARY KEY (transaction_id) ); 

The content in the jsonb column of the transaction looks like

 {"ADDR": "abcd", "CITY": "abcd", "PROV": "", "ADDR2": "", "ADDR3": "","CNSNT": "Research-NA", "CNTRY": "NL", "EMAIL": "@.com", "PHONE": "12345", "HCO_NM": "HELLO", "UNQ_ID": "", "PSTL_CD": "1234", "HCP_SR_NM": "", "HCP_FST_NM": "", "HCP_MID_NM": ""} 

I need a search query, for example:

 SELECT transaction AS data FROM "PreStage".transaction WHERE transaction->>'HCP_FST_NM' ILIKE '%neer%'; 

But I have to give the user the flexibility to find any key / value on the fly.

The answer to the previous question suggested creating an index as:

 CREATE INDEX idxgin ON "PreStage".transaction USING gin ((transaction->>'HCP_FST_NM') gin_trgm_ops); 

Which works, but I also wanted to index other keys. So he tried something like:

 CREATE INDEX idxgin ON "PreStage".transaction USING gin ((transaction->>'HCP_FST_NM'),(transaction->>'HCP_LST_NM') gin_trgm_ops) 

What does not work. What is the best approach to indexing here, or do I need to create a separate index for each key, in which case the approach will not be common if a new key / value pair is added to the data.

+4
pattern-matching indexing postgresql jsonb
source share
2 answers

Syntax error noted by @jjanes ,
for a combination of some popular keys (contained in many lines and / or frequently used), plus many more rare keys (contained in several lines and / or rarely found, new keys can be displayed dynamically). I suggest this combination:

Trigram indices for popular keys

It looks like you are not going to combine several keys in one search often, and one index with many keys will grow very large and slow. Therefore, I would create a separate index for each popular key. Make it a partial index for keys that are not contained in most lines:

 CREATE INDEX trans_idxgin_HCP_FST_NM ON transaction -- contained in most rows USING gin ((transaction->>'HCP_FST_NM') gin_trgm_ops); CREATE INDEX trans_idxgin_ADDR ON transaction -- not in most rows USING gin ((transaction->>'ADDR') gin_trgm_ops) WHERE transaction ? 'ADDR'; 

Etc. As described in my previous answer:

  • Matching pattern for jsonb key / value

Jsonb gin core index

If you have many different keys and / or new keys added dynamically, you can cover the rest with the base GIN index (default) jsonb_ops :

 CREATE INDEX trans_idxgin ON "PreStage".transaction USING gin (transaction); 

Among other things, it supports key search. But you cannot use it to match patterns by values.

  • What is the correct index for querying structures in arrays in Postgres jsonb?

Query

Combine predicates that address both indices:

 SELECT transaction AS data FROM "PreStage".transaction WHERE transaction->>'HCP_FST_NM' ILIKE '%neer%' AND transaction ? 'HCP_FST_NM'; -- even if that seems redundant. 

The second condition also matches our partial indices.

So, either there is a certain index of trigrams for a given (popular / common) key, or , there is at least an index to search for (several) rows containing a rare key, and then a filter to match the values. The same query should give you the best of both worlds.

Be sure to run the latest version of Postgres, recently updated cost estimates. It is imperative that Postgres work with good grades and current table statistics to select the best query plan.

+3
source share

There is no built-in index that does exactly what you want, look for the exact key and the corresponding wild card match value, without specifying in advance which key to use. It should be possible to create an extension that would do this, but that would be an awful lot of work, and I don't know what exists.

Your best option, which works out of the box, might be to translate jsonb into text and index that text:

 create index on transaction using gin ((transaction::text) gin_trgm_ops); 

Then add the secondary condition to your query:

 SELECT transaction AS data FROM transaction WHERE transaction->>'HCP_FST_NM' ILIKE '%neer%' AND transaction::text ilike '%neer%'; 

Now he can use the index to search for something containing "neer", and then later check that "neer" occurs in the value for the "HCP_FST_NM" key, and not just elsewhere in JSONB .

If your query word occurs in many places other than the value of the desired key, this may not give you very good performance. For example, if someone was looking for:

 transaction->>'EMAIL' ilike '%ADDR%' AND transaction::text ilike '%ADDR%'; 

The index will return every row if all records have the same structure as you, because each row contains "ADDR", because it is used as a key. Then each line will not be able to perform another state check, but only after a lot of work has been done.

+3
source share

All Articles