Lucene 4 - How to reverse numerical terms in an index?

Question

Lucene 4 - How to reverse numerical terms in an index?

I use Apache Tika to parse an XML document before indexing Apache Lucene .

This is part of Tiki:

  BodyContentHandler handler = new BodyContentHandler(10*1024*1024);
  Metadata metadata = new Metadata();
  FileInputStream inputstream = new FileInputStream(f);
  ParseContext pcontext = new ParseContext();

  //Xml parser
  XMLParser xmlparser = new XMLParser(); 
  xmlparser.parse(inputstream, handler, metadata, pcontext);

  return handler.toString();// return simple text

I use StandardAnalyzer with a list of stop words for Tokenize my document:

 analyzer = new StandardAnalyzer(StandardAnalyzer.STOP_WORDS_SET);  // using stop words

Is it possible to drop numerical terms because I do not need it?

Thank you for your help.

+4

java lucene apache-tika

tommy Feb 10 '15 at 12:09

source share

No one has answered this question yet.

See similar questions:

2

Standard analyzer with stop

or similar:

3799

How do I read / convert an InputStream to a string in Java?

3324

How to generate random integers in a specific range in Java?

2853

How to convert String to int in Java?

1655

How to create a random alphanumeric string?

4

Indexing n-word expressions as a single member in Lucene

3

Lucene parser for indexing and searching

1

Search Lucene using StopWords in StandardAnalyzer

1

Lucene - How do I opt out of numerical terms when indexing?

0

search in lucene index

0

How to get parsed document terms - Lucene

Lucene 4 - How to reverse numerical terms in an index?

More articles: