How to configure Lucene.NET to search for words with characters without case sensitivity (for example, "C #" or ".net")?

The standard analyzer does not work. From what I can understand, he changes this to c and net

WhitespaceAnalyzer will work, but it is case sensitive.

The general rule is that the search should work like Google, so we hope that in this case the configuration item containing .net , c# was there or there is a workaround for this.

In the sentences below, I tried the custom WhitespaceAnalyzer , but then if the keywords are separated by a comma and without spaces, they are processed incorrectly, for example.

 java,.net,c#,oracle 

will not be returned in a search that would be incorrect.

I came across PatternAnalyzer , which is used to separate tokens, but cannot figure out how to use it in this scenario.

I am using Lucene.Net 3.0.3 and .NET 4.0

+6
source share
3 answers

for others who may also seek an answer

final answer: create a custom TokenFilter and your own analyzer using this token filter along with the Whitespacetokenizer, lowercasefilter, etc., only about 30 lines of code, I will create a blog post and post a link here when I do this, first create a blog !

-2
source

Write your own analyzer class similar to SynonymAnalyzer in Lucene.Net - Custom Synonyms Analyzer . Your redefinition of TokenStream can solve this by pipelining the stream using the WhitespaceTokenizer and LowerCaseFilter .

Remember that your indexer and crawler must use the same parser.

Update: handling multiple comma-separated keywords

If you need to handle non-common comma-separated keywords for your search, rather than indexing, you can convert the search expression to expr , as shown below.

 expr = expr.Replace(',', ' '); 

Then go expr to QueryParser . If you want to support other delimiters like ';' you can do it like this:

 var terms = expr.Split(new char[] { ',', ';'} ); expr = String.Join(" ", terms); 

But you also need to check the expression of a phrase like "sybase, C #. Net, oracle" (the expression includes the quote "chars"), which should not be converted (the user is looking for an exact match):

 expr = expr.Trim(); if (!(expr.StartsWith("\"") && expr.EndsWith("\""))) { expr = expr.Replace(',', ' '); } 

An expression can include both a phrase and some keywords, for example:

 "sybase,c#,.net,oracle" server,c#,.net,sybase 

Then you need to parse and translate the search expression into this:

 "sybase,c#,.net,oracle" server c# .net sybase 

If you also need to process comma-delimited keywords with comma-delimited keywords for indexing, you need to analyze the text for the comma-classified keywords and save them in a separate field, for example. Keywords (which should be associated with your custom analyzer). Then your search handler should convert the search expression as follows:

 server,c#,.net,sybase 

:

 Keywords:server Keywords:c# Keywords:.net, Keywords:sybase 

or more simply:

 Keywords:(server, c#, .net, sybase) 
+7
source

Use the WhitespacerAnalyzer and connect it to the LowerCaseFilter .

Use the same chain when searching and indexing. by converting everything to lower case, you actually make it case insensitive.

According to your description of the problem, this should work and be easy to implement.

+4
source

All Articles