I'm trying to improve our search capabilities for short phrases (in our case, movie titles), and I'm currently looking at SQL Server 2008 full text search, which provides some of the features we would like:
- The phrase (for example, "saw" also means "see", "see", etc.).
- Synonyms (for example, "6" are synonyms for "VI")
However, the ranking algorithm seems problematic using FREETEXTTABLE with a search term and extracting the RANK field. For example, when the user enters a "saw", then the results that we get without a directory:
RANK | Title
All of them have the same rank, although it would be clear to a person that the second and third entries better match other terms.
Similarly, entering “moon” gives the following results:
RANK | Title
And here, although there are no matching matches, it would be clear to a person that the best match for the “moon” is “Moon”, and not longer captions that contain it only as part of the name, but the FTS rates them the same way.
I suppose this is probably due to the way SQL Server evaluates results that process words and synonyms with equal weight to the original term and take into account the word density for ranking, which would be nice with long passages of text, but not really applied with short phrases like these. Therefore, I start with the fact that, unfortunately, the FTS is not suitable for this work.
I really don't want to reinvent the wheel, so are there any existing search solutions that will work for the titles and give a good rating plus stem / thesaurus functionality? It would also be nice if he had a spell check to implement "you mean ..." functionality like Google, so "saww" would be fixed to "see" and "mon" on "moon " etc.
sql-server-2008 full-text-search
Greg beech
source share