Noisy Words in Sql Server 2005 Full Text Search

I am trying to use full-text search in a series of names in my database. This is my first attempt to use full-text search. I am currently taking the search string entered and setting the NEAR clause between each term (i.e., the phrase “Kings of Leon” entered becomes “Kings of NEAR NEAR Leon”).

Unfortunately, I found that this tactic leads to a false negative search result, because the word "from" is dropped by SQL Server when creating indexes, because it is a noise word. Thus, “Kings of Leon” will match correctly, but “Kings of Leon” will not.

My colleague suggests taking all the noise words as defined in MSSQL \ FTData \ noiseENG.txt and putting them in the .Net code so that the noise words can be deleted before the full text search is performed.

Is this the best solution? Is there any automatic magic option that I can change on the SQL server to do this for me? Or maybe just the best solution that doesn't seem hacked?

+4
source share
2 answers

The full text will work outside the search criteria that you provide. You can remove the noise word from the file, but you really run the risk of inflating your index size by doing this. Robert Cain has a lot of good information about his blog regarding this:

http://arcanecode.com/2008/05/29/creating-and-customizing-noise-words-in-sql-server-2005-full-text-search/

To save time, you can see how this method removes them and copies the code and words:

public string PrepSearchString(string sOriginalQuery) { string strNoiseWords = @" 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | $ | ! | @ | # | $ | % | ^ | & | * | ( | ) | - | _ | + | = | [ | ] | { | } | about | after | all | also | an | and | another | any | are | as | at | be | because | been | before | being | between | both | but | by | came | can | come | could | did | do | does | each | else | for | from | get | got | has | had | he | have | her | here | him | himself | his | how | if | in | into | is | it | its | just | like | make | many | me | might | more | most | much | must | my | never | now | of | on | only | or | other | our | out | over | re | said | same | see | should | since | so | some | still | such | take | than | that | the | their | them | then | there | these | they | this | those | through | to | too | under | up | use | very | want | was | way | we | well | were | what | when | where | which | while | who | will | with | would | you | your | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z "; string[] arrNoiseWord = strNoiseWords.Split("|".ToCharArray()); foreach (string noiseword in arrNoiseWord) { sOriginalQuery = sOriginalQuery.Replace(noiseword, " "); } sOriginalQuery = sOriginalQuery.Replace(" ", " "); return sOriginalQuery.Trim(); } 

however, I would probably go with Regex.Replace for this, which should be much faster than the loop. I just don't have a quick example to post.

+4
source

This is where the function works. The noiseENU.txt file noiseENU.txt copied as-from \Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData .

  Public Function StripNoiseWords(ByVal s As String) As String Dim NoiseWords As String = ReadFile("/Standard/Core/Config/noiseENU.txt").Trim Dim NoiseWordsRegex As String = Regex.Replace(NoiseWords, "\s+", "|") ' about|after|all|also etc. NoiseWordsRegex = String.Format("\s?\b(?:{0})\b\s?", NoiseWordsRegex) Dim Result As String = Regex.Replace(s, NoiseWordsRegex, " ", RegexOptions.IgnoreCase) ' replace each noise word with a space Result = Regex.Replace(Result, "\s+", " ") ' eliminate any multiple spaces Return Result End Function 
0
source

All Articles