Too Many Open Files Error in Lucene

Question

Too Many Open Files Error in Lucene

The project I'm working on indexes a certain amount of data (with long texts) and compares them with a list of words per interval (from 15 to 30 minutes).

After some time, say, the 35th round, an error occurred while starting the indexing of the new data set in the 36th round:

[ERROR] (2011-06-01 10:08:59,169) org.demo.service.LuceneService.countDocsInIndex(?:?) : Exception on countDocsInIndex: java.io.FileNotFoundException: /usr/share/demo/index/tag/data/_z.tvd (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.<init>(SimpleFSDirectory.java:69) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:90) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDirectory.java:91) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78) at org.apache.lucene.index.TermVectorsReader.<init>(TermVectorsReader.java:81) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:299) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:580) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:556) at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:113) at org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:29) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:736) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75) at org.apache.lucene.index.IndexReader.open(IndexReader.java:428) at org.apache.lucene.index.IndexReader.open(IndexReader.java:274) at org.demo.service.LuceneService.countDocsInIndex(Unknown Source) at org.demo.processing.worker.DataFilterWorker.indexTweets(Unknown Source) at org.demo.processing.worker.DataFilterWorker.processTweets(Unknown Source) at org.demo.processing.worker.DataFilterWorker.run(Unknown Source) at java.lang.Thread.run(Thread.java:636)

I already tried to set the maximum number of open files:

  ulimit -n <number>

But after some time, when the interval has about 1050 lines of long texts, the same error occurs. But this happened only once.

Should I follow the recommendations for modifying the Lucene IndexWriter mergeFactor from (Too many open files) - SOLR or is it a problem with the amount of indexed data?

I also read that this is a choice between package indexing or interactive indexing. How to determine if indexing is interactive, simply through frequent updates? Should I classify this project under interactive indexing, then?

UPDATE: I add a fragment of my IndexWriter:

  writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED);

It looks like maxMerge (? Or field length ...) is already set as unlimited.

+4

linux indexing lucene file-not-found ioexception

eunique0216 Jun 2 '11 at 4:24

source share

3 answers

You need to double check whether the ulimit value has really been saved and set to the correct value (any maximum).

It is very likely that your application does not close readers / writers of the index properly. I saw many such stories on the Lucene mailing list, and it was almost always the application that was to blame, not Lucene itself.

+1

mindas Jun 2 '11 at 10:50

source share

Use a composite index to reduce the number of files. When this flag is set, lucene will write the segment as a single .cfs file instead of multiple files. This will significantly reduce the number of files.

 IndexWriter.setUseCompoundFile(true)

0

Shashikant Kore Jun 07 '11 at 8:20

source share

eunique0216 · Accepted Answer · 2011-07-01T05:32:39+0000

I already used ulimit, but the error still shows. Then I checked individual kernel adapters for lucene functions. It turns out there are too many IndexWriter.open directories, which is LEFT OPEN.

Please note that after processing, the open directory is always closed.

Too Many Open Files Error in Lucene

More articles: