What is the fastest way to scan a very large file in java?

Imagine that I have a very large text file. Efficiency really matters.

All I want to do is scan it to find a specific line. Perhaps I want to calculate how many I have, but that is really not a problem.

The fact is that the fastest way?

I don’t care that the service should be fast.

Shortcut key

+6
java performance string-search
source share
8 answers

For a single search, use the Scanner as suggested here.

A simple technique that could well be significantly faster than using indexOf () to use a scanner using the findWithinHorizon () method. If you use a constructor that accepts a File object, the Scanner will be a FileChannel to read the file. And to match the pattern, the Boyer-Moore algorithm will eventually be used to efficiently search for strings.

+16
source share

First of all, use nio ( FileChannel ), not the java.io classes. Secondly, use an efficient string search algorithm like Boyer-Moore.

If you need to search the same file several times for different lines, you will want to build some kind of index, so look at Lucene .

+4
source share

Load the entire file into memory, and then look at a string search algorithm like Knuth Morris Pratt .

Edit:
A quick google shows this string search library, which seems to have implemented several different string search algorithms. Notice, I never used it, so I can not vouch for it.

+1
source share

Whatever the features, the memory mapped by the IO is usually the answer.

Edit: depending on your requirements, you can try importing the file into an SQL database and then improve performance using JDBC.

Edit2: this thread in JavaRanch has some other ideas related to FileChannel. I think this may be exactly what you are looking for.

0
source share

I would say that the fastest way you can use BufferedInputStreams on top of FileInputStreams ... or use custom buffers if you want to avoid creating an instance of BufferedInputStream.

This will explain it better than me: http://java.sun.com/developer/technicalArticles/Programming/PerfTuning/

0
source share

Use the tool you need: full-text library

My suggestion is to make an index in memory (or an index based on files with caching enabled), and then search on it. As @Michael Borgwardt suggested, Lucene is the best library.

0
source share

I don’t know if this is a stupid suggestion, but is grep a rather effective file search tool? Perhaps you can call it with Runtime.getRuntime().exec(..)

0
source share

It depends on whether you need to perform more than one search in a file. If you need to do just one search, read the file from disk and analyze it using the tools suggested by Michael Bogwart. If you need to do more than one search, you should probably create a file index with a tool like Lucene : read the file in, tokenise it, hold tokens in the index. If the index is small enough, use it in RAM (Lucene gives the RAM option or an index with disk support). If you do not keep it on disk. And if it is too large for RAM, and you are very, very, very concerned about speed, save your index on a solid state / flash drive.

0
source share

All Articles