How about buffering FileInputStream?

Question

How about buffering FileInputStream?

I have a piece of code that reads a lot (hundreds of thousands) from relatively small files (a couple of KB) from the local file system in a loop. A java.io.FileInputStream is created for each file, created to read the contents. The process is very slow and takes age.

Do you think the FIS packaging in java.io.BufferedInputStream will make a difference?

+7

java file-io

Tomasz Błachowicz May 21 '10 at 12:40

source share

3 answers

I doubt very much that this will make a difference.

Your fundamental problem is hundreds of thousands of tiny files. After reading this, we will make a thrash disk and take it forever, no matter how you do it, you will spend 99.9% of the time waiting for the mechanical movement inside the hard disk.

There are two ways to fix this:

Save data to SSDs - they have significantly less (five orders of magnitude less) latency.
Reorder data into several large files and read them sequentially

+3

Michael borgwardt May 21 '10 at 12:46

source share

It depends on how you read the data. If you read from FileInputStream very inefficiently (for example, byte-by-bit reading ()), then using BufferedInputStream can greatly improve the situation. But if you already use a reasonable size buffer with FileInputStream, switching to BufferedInputStream doesn't matter.

Since you are talking about a large number of very small files, there is a high probability that most of the delay is associated with directory operations (open, closed), rather than actually reading bytes from the files.

+3

David gelhar May 21 '10 at 12:50

source share

Balusc · Accepted Answer · 2010-05-21T12:43:30+0000

If you are not using a decent-sized byte[] buffer in your read / write cycle (the latest BufferedInputStream implementation uses 8KB), then this will certainly make a difference. Try it yourself. Remember to make any OutputStream a BufferedOutputStream .

But if you have already buffered it with byte[] and / or it, in the end, is of little importance, then you have chosen the speed of the hard disk and the I / O controller as a bottleneck.

How about buffering FileInputStream?

More articles: