I want to index a “compound word,” for example, “New York,” as one term in Lucene, not as “new,” “York.” Thus, if someone is looking for a “new place,” the documents containing “New York” will not match.
I think this is not the case for N-grams (actually NGramTokenizer), because I will not index only any n-grams, I want to index only some specific n-grams.
I did some research, and I know that I should write my own analyzer and, possibly, my own tokenizer. But I lost the TokenStream / TokenFilter / Tokenizer extension a bit.
thanks
source share