Parallel file Read faster than sequential read?

Question

Parallel file Read faster than sequential read?

I'm just File.Read parallel using PLINQ / Parallel could be faster? My code is as follows (.Net 4.0):

 public static void ReadFileParallel(List<string> fileName) { Parallel.Foreach(fileName, file=>File.Read(file)); } public static void ReadFilePLINQ(List<string> fileName) { fileName.AsParallel().foreach(file=>File.Read(file)); }

I ask this because I thought that reading the file is related to IO, so parallel operation will not help, am I correct?

+6

c # file-io

Graviton Jul 13 '10 at 14:02

source share

7 answers

You think so, but that’s not what the measurements show. When file I / O has a significant delay, especially in networks, parallel execution of this process can lead to pipe filling.

+1

Steven sudit Jul 13 '10 at 14:08

source share

MSFT has an excellent PDF document that explores the possibilities of parallel and streaming in detail.

This can help.

http://www.microsoft.com/downloads/details.aspx?FamilyID=86b3d32b-ad26-4bb8-a3ae-c1637026c3ee&displaylang=en

0

keyle Jul 13 '10 at 14:06

source share

In a first approximation, this will help if the files are on different disks and slow down otherwise (due to the increase in search time).

This can be a little faster if all files are cached (since you can use multiple cores).

It’s best, of course, to run some tests.

0

tc. Jul 13 '10 at 14:08

source share

You do not execute the parallel File.Read file, you make several File.Reads files in parallel. If the files are in different spindles, you will get improved throughput by simply using multiple spindles at once.

You can also experience improved performance, even if you use a single spindle, if after each reading there is processing associated with the processor, although in this case it would be much better to plan the Tasks objects. In this case, you may have some task of loading data from files, while others use already loaded data to perform heavy processing.

0

Panagiotis kanavos Jul 13 '10 at 14:11

source share

I think you pretty much hit a nail on your head.

Parallel operations as a whole are always compressed by the point at which you run out of resources for parallel operations, but even then you still have diminishing returns on the increasing number of parallel threads.

Jeff Atwood wrote an interesting tweet that I’ll add to this later, showing diminishing returns from over-threaded multi-threaded processors. Of course, this is not exactly the same. But let's look at this for reasons that even if you had 100 files on 100 hard drives, somewhere that the IO gets reset down one channel, which will lead to some reduction in the increase in reading.

The fact that I'm basically trying to say just running something in parallel does not mean that it will be accelerated, it is important to consider how parallel processes actually work.

0

msarchet Jul 13 '10 at 14:15

source share

This is a difficult business. If you make a mistake, the head of the disk moves back and forth, trying to read two files at the same time. This is especially important for large files.

However, if you read a lot of small files in parallel, you may get a little, because the disk subsystem may choose to read the files in a different order than you requested. However, I did not see this effect in real life.

Also, the processing that you do on the content can be performed in parallel with reading files. Therefore, before sending you need to consult and check.

0

user180326 Jul 13 '10 at 14:25

source share

Dave markle · Accepted Answer · 2010-07-13T14:07:49+0000

It depends.

If your files were located in different places, on different network resources or on different physical hard drives, then yes, parallel loading may help. If they are on the same spinning hard drive, reading parallel files is likely to significantly hurt your performance due to the extra search time you are likely to incur from these parallel readings.

If your files are on an SSD, you are likely to get a little lower performance, but it will depend on how many files you read in parallel and what their sizes are. I believe that with a certain threshold for file size and the number of concurrent reads, performance will decrease significantly. It is hard to say this without any experimentation.

Parallel file Read faster than sequential read?

More articles: