Processing + as a special character in Lucene search

How can I make sure lucene returns me relevant search results when my input line contains terms like C ++? Lucene seems to ignore ++ characters.

Information about codes: When I execute this line, I get an empty search query.

queryField = multiFieldQueryParser.Parse(inpKeywords); keywordsQuery.Add(queryField, BooleanClause.Occur.SHOULD); 

And here is my custom analyzer:

 public class CustomAnalyzer : Analyzer { private static readonly WhitespaceAnalyzer whitespaceAnalyzer = new WhitespaceAnalyzer(); public override TokenStream TokenStream(String fieldName, System.IO.TextReader reader) { TokenStream result = whitespaceAnalyzer.TokenStream(fieldName, reader); result = new StandardTokenizer(reader); result = new LowerCaseFilter(result); result = new StopFilter(result, stop_words); return result; } } 

And I execute the search query as follows:

 indexSearcher.Search(searchQuery, collector); 

I tried queryField = multiFieldQueryParser.Parse (QueryParser.Escape (inpKeywords)), but it still does not work. Here is a query that executes and returns null images. "+ (())"

Thanks.

+6
lucene
source share
3 answers

Since + is a special character, it must be escaped. The list of all characters that need to be escaped is here (see the bottom of the page.)

You also need to be careful about the analyzer that you use during indexing. For example, StandardAnalyzer will skip + . You may need to use something like WhiteSpaceAnalyzer during indexing and searching, which will store special characters in the token stream. Keep in mind that you need to use the same parser when indexing and searching.

+3
source share

In addition to choosing the right parser, you can use QueryParser.Escape(string s) to ensure that all special characters are correctly escaped.

Since this is a static function , you can use it even if you use MultiFieldQueryParser.

For example, you can try something like this:

 queryField = multiFieldQueryParser.Parse(QueryParser.Escape(inpKeywords)); 
+1
source share

Try coding your UTF-8 searches.

You can enable this as described in this article.

0
source share

All Articles