How to index a hyphen word in Lucene?

I have a StandardAnalyzer work that extracts words and frequencies from a single document using TermVectorMapper, which fills the HashMap.

But if I use the following text as a field in my document, i.e.

addDoc(w, "lucene Lawton-Browne Lucene"); 

Word frequencies returned in HashMap:

brown 1 lucene 2 lawton 1

The problem is the words "lawton" and "browne". If it's an actual โ€œdouble-barreled name,โ€ can Lutsen recognize his โ€œLorton Brown,โ€ where the name is actually one word?

I tried combinations:

 addDoc(w, "lucene \"Lawton-Browne\" Lucene"); 

And single quotes, but without success.

thanks

Mr. Morgan.

+4
source share
2 answers

If you still want to use the stop word list, I suggest you try PatternAnalyzer. It allows you to use such a list and has a pre-filled whitespace template.

Or you end the space analyzer and do something like this in tokenStream (String fieldName, Reader reader), you do something like this:

 public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream stream = myWhitespaceAnalyzer.tokenStream(fieldName, Reader); stream = new StopFilter(stream, stopWords); return stream; } 
+1
source
0
source

All Articles