The project I'm working on indexes a certain amount of data (with long texts) and compares them with a list of words per interval (from 15 to 30 minutes).
After some time, say, the 35th round, an error occurred while starting the indexing of the new data set in the 36th round:
[ERROR] (2011-06-01 10:08:59,169) org.demo.service.LuceneService.countDocsInIndex(?:?) : Exception on countDocsInIndex: java.io.FileNotFoundException: /usr/share/demo/index/tag/data/_z.tvd (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.<init>(SimpleFSDirectory.java:69) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:90) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDirectory.java:91) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78) at org.apache.lucene.index.TermVectorsReader.<init>(TermVectorsReader.java:81) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:299) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:580) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:556) at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:113) at org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:29) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:736) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75) at org.apache.lucene.index.IndexReader.open(IndexReader.java:428) at org.apache.lucene.index.IndexReader.open(IndexReader.java:274) at org.demo.service.LuceneService.countDocsInIndex(Unknown Source) at org.demo.processing.worker.DataFilterWorker.indexTweets(Unknown Source) at org.demo.processing.worker.DataFilterWorker.processTweets(Unknown Source) at org.demo.processing.worker.DataFilterWorker.run(Unknown Source) at java.lang.Thread.run(Thread.java:636)
I already tried to set the maximum number of open files:
ulimit -n <number>
But after some time, when the interval has about 1050 lines of long texts, the same error occurs. But this happened only once.
Should I follow the recommendations for modifying the Lucene IndexWriter mergeFactor from (Too many open files) - SOLR or is it a problem with the amount of indexed data?
I also read that this is a choice between package indexing or interactive indexing. How to determine if indexing is interactive, simply through frequent updates? Should I classify this project under interactive indexing, then?
UPDATE: I add a fragment of my IndexWriter:
writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED);
It looks like maxMerge (? Or field length ...) is already set as unlimited.
source share