Failed to re-read Lucene TokenStream

I am using Lucene 4.6, and it seems unclear how to reuse TokenStream because I am getting an exception:

java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

at the beginning of the second pass. I read the Javadoc, but I still missed something. Here is a simple example that throws an exception above:

@Test
public void list() throws Exception {
  String text = "here are some words";
  TokenStream ts = new StandardTokenizer(Version.LUCENE_46, new StringReader(text));
  listTokens(ts);
  listTokens(ts);
}

public static void listTokens(TokenStream ts) throws Exception {
  CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
  try {
    ts.reset();
    while (ts.incrementToken()) {
      System.out.println("token text: " + termAtt.toString());
    }
    ts.end();
  }
  finally {
    ts.close();
  }
}

I tried not to call TokenStream.end()or TokenStream.close(), thinking that they only need to be called at the very end, but I get the same exception.

Can anyone suggest a suggestion?

+4
source share
1 answer

Exception reset() , . Tokenizer. java.io.Reader api reset() , Tokenizer , Reader reset.

TokenStream, , Tokenizer.setReader(Reader) ( close() ).

+3

All Articles