Reading multiple files in multiple threads using C #, Slow!

I have an Intel Core 2 Duo processor, and I read 3 files from my C: drive and showed some corresponding values ​​from the files on the EditBox on Screen.The whole process takes 2 minutes. Then I thought about processing each file in a separate thread, and then the whole process takes 2.30 minutes! those. 30 seconds more than single-threaded processing.

I was expecting another way! I see both graphs in CPU usage history. Someone please explain to me what is happening? here is my code snippet.

  foreach (FileInfo file in FileList) { Thread t = new Thread(new ParameterizedThreadStart(ProcessFileData)); t.Start(file.FullName); } 

where processFileData is the file processing method.

Thanks!

+4
source share
4 answers

The root of the problem is that the files are on the same drive and, unlike your dual-core processor, your hard drive can only do one thing at a time.

If you read two files at the same time, the disk heads will move from one file to another and vice versa. Given that your hard drive can read each file after about 40 seconds, it now has the additional overhead of moving the head of the disk between three separate files while reading.

The fastest way to read multiple files from one hard drive is to do everything in one stream and read them one by one. Thus, the head moves only once to read the file (at the very beginning), and not several times to read.

To optimize this process, you need to either change your logic (do you really need to read the entire contents of all three files?). Or buy a faster hard drive / put 3 files on three different hard drives and use threading / use raid.

+10
source

If you are reading from a disk using multiple streams, then the disk heads will bounce from one part of the disk to another, since each stream is read from another part of the disk. This can significantly reduce bandwidth, as you saw.

For this reason, in fact, the idea often arises that all disk accesses go through a single thread to help minimize search requests.

If your task is related to I / O binding, and if it should be done often, you can look at a tool like "contig" to make sure that the layout of your files on disk is optimized / contiguous.

+3
source

If you are mainly handling IO bindings and CPU related ones, then it makes sense that it takes the same time or even more.

How do you compare these files? You should think what is the bottleneck of your application? IO output / input, CPU, memory ...

Multithreading is only interesting for processing related CPUs. that is, complex calculation, comparing data in memory, sorting, etc.

+1
source

Since your process is tied to IO, you must let the OS execute your threads for you. Take a look at FileStream.BeginRead () for an example of how to queue your reads. Your EndRead () method can unwind your next request to read the next data block, which points to itself to process each subsequent completed block.

In addition, when creating additional threads, the OS must manage a large number of threads. And if another processor is selected to handle the completed read, you have lost all caching of the processor in which your thread occurred.

As you have found, you cannot “speed up” the application by simply adding threads.

0
source

All Articles