SQL Server, ISABOUT, Weighted Terms

I am trying to figure out exactly how the weighted terms work in the ISABOUT query in SQL SERVER.

This is where I am right now:

Each query returns the following lines:

QUERY 1 (weight 1): initial rating

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (1) ) ') ORDER BY RANK DESC, [KEY] KEY RANK 306342 249 272619 156 221557 114 

QUERY 2 (weight 0.8): rating increases, the original order is maintained

 SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.8) ) ') ORDER BY RANK DESC, [KEY] KEY RANK 306342 321 272619 201 221557 146 

QUERY 3 (weight 0.2): rating increases, the original order is maintained

 SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.2) ) ') ORDER BY RANK DESC, [KEY] KEY RANK 306342 998 272619 877 221557 692 

QUERY 4 (weight 0.17): the rating decreases, the best match is the last, inverted behavior for these terms starts with 0.17

 SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.17) ) ') ORDER BY RANK DESC, [KEY] KEY RANK 272619 960 221557 958 306342 802 

QUERY 5 (weight 0.16): the rating is increasing, the best match is now the second

 SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.17) ) ') ORDER BY RANK DESC, [KEY] KEY RANK 272619 978 306342 935 221557 841 

QUERY 6 (weight 0.01): rating drops, best match last again

 SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.01) ) ') ORDER BY RANK DESC, [KEY] KEY RANK 221557 105 272619 77 306342 50 

The best match for weight 1 has a rank of 249, and if the weight is reduced to 0.2, the ranking of the best match increases to 998. From 0.2 to 0.17, the ranking decreases, and from 0.16 the results are inverted (weight values ​​that reproduce this behavior , depend on the conditions and, perhaps, looking for the columns ...)

There seems to be a point where weight means the opposite, something like "do not include this term." Do you have an explanation for this behavior? Why does rating increase when weight decreases? Why does the rating decline after a certain point until the results are inverted, and how can you predict this point?

I use a custom word break when a user searches for something by creating the following query:

 CONTAINSTABLE(documentParts, title, 'ISABOUT ( "wordA wordB wordC" weight (0.8), "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6), "wordA*" weight (0.1), "wordB*" weight (0.1), "wordC*" weight (0.1), ) ') 

Am I Expecting Big Ranks for 0.1 Words?
Is the following query the same as above, and can I expect strange behavior with a rating of 0.1?

 CONTAINSTABLE(documentParts, title, ' ISABOUT ( "wordA wordB wordC" weight (0.8) ), OR ISABOUT ( "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6) ), OR ISABOUT ( "wordA*" weight (0.1) ), OR ISABOUT ( "wordB*" weight (0.1) ), OR ISABOUT ( "wordC*" weight (0.1) ), ') 
+7
source share
2 answers

In my experience, I had better results when the scales were up to 1.

 CONTAINSTABLE(documentParts, content, 'ISABOUT ( "wordA wordB wordC" weight (0.5), "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.2), "wordA*" weight (0.1), "wordB*" weight (0.1), "wordC*" weight (0.1) ) ') 
+4
source

As the clock is ticking, I get something like this that gives good results ...:

 SELECT [KEY], SUM([RANK]) AS [RANK] FROM ( SELECT [KEY], ([RANK]*1)/(SUM([RANK]) OVER( PARTITION BY 1)/ CAST(COUNT([RANK]) OVER( PARTITION BY 1) AS FLOAT)) AS [RANK] FROM CONTAINSTABLE(documentParts, content, 'ISABOUT ( "wordA wordB wordC" weight (0.8), "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6), "wordA*" weight (0.4), "wordB*" weight (0.4), "wordC*" weight (0.4) ) ') c WHERE c.RANK>0 UNION ALL SELECT [KEY], ([RANK]*2)/(SUM([RANK]) OVER( PARTITION BY 1)/ CAST(COUNT([RANK]) OVER( PARTITION BY 1) AS FLOAT)) AS [RANK] FROM CONTAINSTABLE(documents, title, 'ISABOUT ( "wordA wordB wordC" weight (0.8), "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6), "wordA*" weight (0.4), "wordB*" weight (0.4), "wordC*" weight (0.4) ) ') c WHERE c.RANK>0 ) t GROUP BY [KEY] ORDER BY [RANK] DESC 

I will pass it to the test group and call it day ...

+3
source

All Articles