List of Tokens on Lucene 3

I am new to Lucene, I started to study the version 3 branch, and there is one thing that I do not understand (obviously, because I do not experience the topic).

In Lucene 2.9, if I need a list of tokens, I would create an ArrayList of the Token class, like ArrayList. This is pretty intuitive for me and the concept of token is very clear.

Now that the use of the Token class has been canceled in favor of an attribute-based API, do I need to create my own class to encapsulate the attributes I need? If so, isn't this a recreation of the Lucene token class?

I am in a class for testing analyzers, and the list of resulting tokens makes testing easier, I think.

Any help would be appreciated;) Thank you!

+5
source share
3 answers

According to Token Javadoc , "Although there is no need to use Token anymore, with the new TokenStream API, it can be used as a convenience class that implements all attributes, which is especially useful for simply moving from the old to the new TokenStream API."

I suggest you use a token. It is as described above.

+2
source

Use class TermAttribute:

TokenStream stream = analyzer.tokenStream("field", "text");
TermAttribute termAttr = stream.getAttribute(TermAttribute.class);
while (stream.incrementToken()) {
    String token = termAttr.term();
}
+2
source

, - :

TokenStream tkst = analyzer.tokenStream("field", "text");
Token token = tkst.getAttribute(Token.class);
while (tkst.incrementToken()) {
// Do something with token.
}

: http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/analysis/package-summary.html

+1

All Articles