How to index a hyphen word in Lucene?

Question

How to index a hyphen word in Lucene?

I have a StandardAnalyzer work that extracts words and frequencies from a single document using TermVectorMapper, which fills the HashMap.

But if I use the following text as a field in my document, i.e.

addDoc(w, "lucene Lawton-Browne Lucene");

Word frequencies returned in HashMap:

brown 1 lucene 2 lawton 1

The problem is the words "lawton" and "browne". If it's an actual “double-barreled name,” can Lutsen recognize his “Lorton Brown,” where the name is actually one word?

I tried combinations:

 addDoc(w, "lucene \"Lawton-Browne\" Lucene");

And single quotes, but without success.

thanks

Mr. Morgan.

+4

java lucene

Mr morgan Oct 24 '10 at 20:01

source share

2 answers

csupnig · Answer 1 · 2011-04-20T19:12:09+0000

If you still want to use the stop word list, I suggest you try PatternAnalyzer. It allows you to use such a list and has a pre-filled whitespace template.

Or you end the space analyzer and do something like this in tokenStream (String fieldName, Reader reader), you do something like this:

 public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream stream = myWhitespaceAnalyzer.tokenStream(fieldName, Reader); stream = new StopFilter(stream, stopWords); return stream; }

Aaron saunders · Answer 2 · 2010-10-24T20:16:13+0000

Escape over the characters

see here Lucene Documentation

http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Escaping%20Special%20Characters

How to index a hyphen word in Lucene?

More articles: