Auto-complete data from a huge table

Question

Auto-complete data from a huge table

I need to make autocomplete functionality and do something like this:

select field from huge_table where field like '%some string%';

The table contains 2 million rows, and I need quick answers and some answers. We use Postgres. And this request will be forever.

Is there an efficient way to do this with postgres? Or maybe I should use some other thing besides postgres?

Thanks!

+6

sql database autocomplete postgresql

phasnox Mar 28 '13 at 20:55

source share

4 answers

Wilduck · Answer 1 · 2013-03-28T21:06:35+0000

If you do autocomplete, I assume you are looking for matches based on the prefix. The standard data structure for prefix-based searches is trie .

If you cannot get adequate performance from postgres using an index and prefix-based search ( some string% ), you can periodically run a full query from all 2 million rows and build a trie or keep one parallel to the database.

The worst performance of Trie is O(m) , where m is the length of your prefix, so very fast autocomplete will be provided after it is created.

mattytommo · Answer 2 · 2013-03-28T20:59:44+0000

You can add an index to the search field .

In addition, if this can be avoided, do not use open wild cards like %some string% , they really suffered . If possible, do some string% .

Sergio Ayestarán · Answer 3 · 2013-03-28T21:13:56+0000

If you can afford the extra insert / update time, perhaps you can use the pg_trgm extension

You have some tests in this link with a table of 2 million records to see an improvement at best.

Chris farmiloe · Answer 4 · 2013-03-28T23:01:56+0000

Depending on the specifics of your use case, it may be worth knowing that tsquery has syntax for querying for word prefixes. Combine this with the tsvector indexed field, and you can very quickly find the word prefixes.

Create your “huge” table:

 CREATE TABLE huge_table ( field text, field_tsv tsvector );

Add an index:

 CREATE INDEX field_tsv_idx ON huge_table USING gin(field_tsv);

Add a trigger to update the indexed column:

 CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON huge_table FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(field_tsv, 'pg_catalog.english', field);

Add some mock data

 INSERT INTO huge_table (field) VALUES ('something nice'); INSERT INTO huge_table (field) VALUES ('another thing');

Then the request for prefixes with some kind of limit:

 SELECT field FROM huge_table WHERE field_tsv @@ to_tsquery('anot:*') LIMIT 20; field --------------- another thing (1 row)

More on docs , especially in index types , as your index can get quite large.

Auto-complete data from a huge table

More articles: