Lucene 4 - How to reverse numerical terms in an index?

I use Apache Tika to parse an XML document before indexing Apache Lucene .

This is part of Tiki:

  BodyContentHandler handler = new BodyContentHandler(10*1024*1024);
  Metadata metadata = new Metadata();
  FileInputStream inputstream = new FileInputStream(f);
  ParseContext pcontext = new ParseContext();

  //Xml parser
  XMLParser xmlparser = new XMLParser(); 
  xmlparser.parse(inputstream, handler, metadata, pcontext);

  return handler.toString();// return simple text

I use StandardAnalyzer with a list of stop words for Tokenize my document:

 analyzer = new StandardAnalyzer(StandardAnalyzer.STOP_WORDS_SET);  // using stop words

Is it possible to drop numerical terms because I do not need it?

Thank you for your help.

+4
source share

All Articles