BufferedReader in a multi-core environment

I have 8 files. Each of them is about 1.7 GB. I read these files into a byte array, and this operation is fast enough.

Then each file is read as follows:

BufferedReader br=new BufferedReader(new InputStreamReader(new ByteArrayInputStream(data))); 

When processed using a single core in a sequential sense, it takes about 60 seconds. However, when distributing the calculations across 8 separate cores, it takes much more than 60 seconds per file.

Since all data is stored in memory and I / O operations are not performed, I would suggest that it takes no more than 60 seconds to process one file per core. Thus, only 8 files should complete in just over 60 seconds, but this is not the case.

Am I missing something regarding the behavior of the BufferedReader? or any of the readers used in the above code.

It might be worth mentioning that I first use this code to upload files:

 byte[] content=org.apache.commons.io.FileUtils.readFileToByteArray(new File(filePath)); 

The code above all looks like this:

 For each file read the file into a byte[] add the byte[] to a list end For For each item in the list create a thread and pass a byte[] to it end For 
+7
source share
2 answers

How do you actually “distribute computing”? Is there any synchronization? Are you just creating 8 threads to read 8 files?

What platform are you working on (linux, windows, etc.)? I saw a seemingly strange behavior from the window scheduler before it moves one process from core to core in order to try to balance the load between the cores. This ultimately led to a decrease in performance than just the ability to use one core more than the rest.

+3
source

How much memory does your system take?

8 x 1.7 GB, + operating system overhead, may mean that virtual memory / paging should come into play. This is obviously much slower than RAM.

I appreciate that you say that each file is in memory, but in fact you actually have 16 GB of free memory, or more happens at an abstract level?

If the context switch also has to constantly switch to pages, this explains the increase in time.

+2
source

All Articles