A few words from the synonyms dictionary in PostgreSQL full-text search

I am trying to do a full text search in PostgreSQL 8.3. It worked great, so I added synonyms (for example, "bob" == 'robert') using a dictionary of synonyms. This works great. But I noticed that he apparently only allows a synonym for one . That is, "al" cannot be "albert" and "allen".

It is right? Is there a way to have multiple dictionary matches in a PostgreSQL synonym dictionary?

For reference, here is my example dictionary file:

bob robert bobby robert al alan al albert al allen 

And the SQL that creates the full text search configuration:

 CREATE TEXT SEARCH DICTIONARY nickname (TEMPLATE = synonym, SYNONYMS = nickname); CREATE TEXT SEARCH CONFIGURATION dxp_name (COPY = simple); ALTER TEXT SEARCH CONFIGURATION dxp_name ALTER MAPPING FOR asciiword WITH nickname, simple; 

What am I doing wrong? Thanks!

+4
source share
3 answers

This is a limitation on how synonyms work. What you can do is rotate it, as in:

 bob robert bobby robert alan al albert al allen al 

It should give the same final result, which is that the search for one of them will correspond to the same.

+4
source

The dictionary should determine the functional relationship between words and tokens, otherwise it will not know which word will return when you are lexing. In your example, al maps to three different values, thus defining a multi-valued function, and the lexize function does not know what to return. As Magnus shows, you can use lexize from the proper names alan, albert, allen for the nickname al .

Remember, however, that the FTS vocabulary point does not need to perform conversions per se, but rather effectively indexes semantically relevant words. This means that the token should not resemble the original entry in any linguistic sense. Although you are right that many-to-many relationships cannot be determined, do you really need to? For example, to enable your vin example:

 vin vin vincent vin vincenzo vin vinnie vin 

but you can also do this:

 vin grob vincent grob vincenzo grob vinnie grob 

and get the same effect (although, why would you like, this is another story).

Thus, if you must parse a document with 11 versions of the Vincent name, then the to_tsvector function will return vin:11 in the first case and grob:11 in the latter.

+2
source

Documentation 8.4 refers to a replacement dictionary of synonyms, maybe it will be useful?

http://www.postgresql.org/docs/8.4/interactive/dict-xsyn.html

+1
source

All Articles