C # TPL tasks - how many at the same time

I am learning how to use TPL to parellizing the application that I have. The application processes ZIP files, selecting all the files stored in them, and imports the contents into a database. There may be several thousand zip files awaiting processing at a given time.

How can I start a separate task for each of these ZIP files or is it an inefficient way to use TPL?

Thanks.

+7
source share
4 answers

This seems like a problem that is better suited for worker threads (a separate thread for each file) managed by ThreadPool rather than TPL. TPL works great when you can split and capture a single data item, but your zip files are processed individually.

The I / O disc will be your bottleneck, so I think you will need to throttle the number of jobs that run at one time. It’s easy to manage this using workflows, but I'm not sure how much control (if not) you have compared to the parallel for foreach, as much as possible, like parallelism, which can strangle your process and actually slow it down.

+4
source

At any time, when you have a lengthy process, you can usually get extra performance on multiprocessor systems by creating different threads for each input task. Therefore, I would say that you are most likely going the right way.

+1
source

I would think that this would depend on whether the process would be limited by the processor or the disk. If the process is limited to disk, I thought it might be a bad idea to start too many threads, as different extractions can just compete with each other.

This is similar to what you need to measure in order to get the right answer for what is best.

+1
source

I disagree with some statements here guys.

First of all, I do not see the difference between ThreadPool and tasks in coordination or control. Especially when tasks are performed on ThreadPool and you have easy control over tasks, exceptions are well conveyed to the caller while waiting or waiting in Tasks.WhenAll (tasks), etc.

Secondly, I / O will not necessarily be the only bottleneck here, depending on the data and the compression level, ZIPping will take msot, rather, more time than reading a file from disk.

This can be judged in different ways, but I’d better go for something like the number of processor cores or a little less.

Download file paths in ConcurrentQueue, and then run the running tasks to delete files, download files, replace them, and save.

From there, you can configure the number of cores and play with load balancing.

I don't know if ZIP supports file partition support during compression, but in some complex / complex cases this might be a good idea, especially for large files ...

WOW, question 6 years old, bummer! I did not notice...:)

0
source

All Articles