SQL Server full-text search for a hyphen phrase does not return expected results

We have an application using SQL Server 2008 database and full-text search. I am trying to understand why the following searches behave differently:

First, a phrase containing a hyphen word, for example:

contains(column_name, '"one two-three-four five"') 

And secondly, an identical phrase, where hyphens are replaced by spaces:

 contains(column_name, '"one two three four five"') 

The full-text index uses the ENGLISH (1033) locale and the default system stop list.

From my observations of other full-text searches containing transferable words, the first should allow matches on one two three four five or one twothreefour five . Instead, it matches only one twothreefour five (and not one two-three-four five ).


Test case

Setup:

 create table ftTest ( Id int identity(1,1) not null, Value nvarchar(100) not null, constraint PK_ftTest primary key (Id) ); insert ftTest (Value) values ('one two-three-four five'); insert ftTest (Value) values ('one twothreefour five'); create fulltext catalog ftTest_catalog; create fulltext index on ftTest (Value language 1033) key index PK_ftTest on ftTest_catalog; GO 

Inquiries

 --returns one match select * from ftTest where contains(Value, '"one two-three-four five"') --returns two matches select * from ftTest where contains(Value, '"one two three four five"') select * from ftTest where contains(Value, 'one and "two-three-four five"') select * from ftTest where contains(Value, '"one two-three-four" and five') GO 

Cleaning:

 drop fulltext index on ftTest drop fulltext catalog ftTest_catalog; drop table ftTest; 
+7
source share
3 answers

http://support.microsoft.com/default.aspx?scid=kb;en-us;200043

"If the search criteria uses a non-alphanumeric character (basically a dash is a character), use the LIKE Transact-SQL clause instead of the FULLTEXT or CONTAINS predicates."

+7
source

In such cases, when you cannot anticipate the behavior of the word breaker, it is always useful to run sys.dm_fts_parser on your lines to get an idea of ​​how words will be broken and stored in the internal index.

For example, running sys.dm_fts_parser on "one two three three four five" leads to the following:

 select * from sys.dm_fts_parser('"one two-three-four five"', 1033, NULL, 0) --edited-- 1 0 1 Exact Match one 1 0 2 Exact Match two-three-four 1 0 2 Exact Match two 1 0 3 Exact Match three 1 0 4 Exact Match four 1 0 5 Exact Match five 

As you can see from the results, the word breaker parses the string and displays six forms that can explain the results that you see when you run your CONTAINS query.

+5
source

A full-text search considers a word as a string of characters without spaces or punctuation. The appearance of a non-alphanumeric character can β€œbreak” a word during a search. Because SQL Server full-text search is a word-based mechanism, punctuation is usually ignored and ignored in index searches. Thus, a CONTAINS clause, such as CONTAINS (testing, "computer crash"), will correspond to a line with the value: "Failure to find my computer will be expensive."

Please follow the link for WHY: https://support.microsoft.com/en-us/kb/200043

+1
source

All Articles