Processing huge files in java

I have a huge file about 10 GB in size. I have to perform operations like sorting, filter, etc. In Java files. Each operation can be performed in parallel.

Is it good to start 10 threads and read the file in parallel? Each stream reads 1 GB of file. Is there another way to solve the problem with extra large files and process them as quickly as possible? Is NIO good for such scenarios?

I am currently performing operations sequentially and it takes about 20 minutes to process such files.

Thanks,

+8
java file nio
source share
2 answers

Is it good to start 10 threads and read the file in parallel?

Almost certainly not - although it depends. If this is with an SSD (where there is virtually no search time), then maybe. If it's a traditional drive, definitely not.

This does not mean that you cannot use several streams, although you can probably create one stream to read a file, performing only the most elementary tasks, to get data into processed pieces. Then use the producer / consumer queue to have multiple threads process the data.

Not knowing more than "sort, filter, etc." (which is rather vague), we cannot say how parallelizing the process is in the first place - but trying to execute IO in parallel in one file will probably not help.

+11
source share

Try profiling the code to find out where the bottlenecks are. Have you tried for one thread to read the entire file (or as many as possible) and pass this to 10 threads for processing? If file I / O is your bottleneck (which seems plausible), this should improve overall runtime.

+1
source share

All Articles