Suggest as google with postgresql trigrams and full-text search

I want to do a text search such as Google recommendations .

I use PostgreSQL because of the magic of Postgis .

I was thinking about using FTS , but I saw that he could not search for partial words , so I found this question and saw how trigrams work.

The main problem is that the search engine I'm working on is for Spanish. FTS handled crashes and dictionaries (synonyms, spelling errors), UTF, etc. Trigrams worked fine for partial words, but they only work for ASCII, and (obviously) they don't use dictionaries like dictionaries.

I was wondering if there is a way by which the best things of both can be used.

Is it possible to do full-text search and collaboration trigrams in PGSQL?

+7
source share
2 answers

You can do it in Postgres and don’t need Lucene.

You can quote phrases in tsquery or tsvector as shown below. You can add :* after tsquery to search by prefixes:

 select '''new york city'''::tsvector @@ '''new yo'':*'::tsquery, --true '''new york times'''::tsvector @@ '''new yo'':*'::tsquery, --true '''new york'''::tsvector @@ '''new yo'':*'::tsquery, --true '''new'''::tsvector @@ '''new yo'':*'::tsquery, --false 'new'::tsvector @@ '''new yo'':*'::tsquery, --false 'new york'::tsvector @@ '''new yo'':*'::tsquery --false 

The main problem is that to_tsvector() and [plain]to_tsquery() will separate your quotes. You can write your own versions that do not (this is not so difficult), or after processing after them build your own term n-grams.

Extra single quotes are just shoots. select $$ i heart 'new york city' $$::tsvector; equivalent to.

+3
source

I would recommend a look at Lucene . It can be integrated natively in Java, easily in .NET, or using SOLR and web services in php.

It has great opportunities for free text searches, ranking terms out of the box, support for different languages, using different Analyzers (link for Spanish one).

And last but not least, it is also very fast (for large volumes, say, the 4Gb index is ~ 5,000,000 rows in the database, it is much faster than the Postgres database).

+2
source

All Articles