Is the Sitecore 7 ContentSearch API to remove stop words from queries?

I found that searches containing "from", "and", "and" etc. would not return results because Lucene removed the stop words. Therefore, if I look for an item called "The Consequences of the First World War", I will get zero results.

But if I separated 'of' and 'the', then I search "after the first world war". I will return the expected document.

Does the ContentSearch API extract stop words from queries? Can Lucene be configured for deletion? Or should I remove these stop words before creating my request?

Thanks Adam

+7
lucene sitecore sitecore7
source share
2 answers

You can customize Sitecore Standard Analyzer to accept your own set of stop words. Create a text file with stop words (single-line word per line), and then make the following configuration changes in the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config file

<param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.DefaultPerFieldAnalyzer, Sitecore.ContentSearch.LuceneProvider"> <param desc="defaultAnalyzer" type="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"> <param hint="version">Lucene_30</param> <param desc="stopWords" type="System.IO.FileInfo, mscorlib"> <param hint="fileName">[FULL_PATH_TO_SITECORE_ROOT_FOLDER]\Data\indexes\stopwords.txt</param> </param> </param> </param> 

Further reading: I wrote a blog post about this issue and could help http://blog.horizontalintegration.com/2014/03/19/sitecore-standard-analyzer-managing-you-own-stop-words-filter/

+2
source share

I think this is the same problem with the problem from the blog .

Can you try the steps from the blog post?

Another option would be to create a custom parser and provide a stopWords list constructor. Something like:

 public class CustomAnalyzer : Lucene.Net.Analysis.Standard.StandardAnalyzer { private static Hashtable stopWords = new Hashtable() { {"of", "of"}, {"stopword2", "stopword2"} }; public CustomAnalyzer() : base(Lucene.Net.Util.Version.LUCENE_30, stopWords) { } } 

After the change, you must change your configuration file. You can find a good blog post about Analyzer here . PS: I have not tried my code if it really works.

+1
source share

All Articles