Parallelization of several dependent operations with variable boundaries

Here is a list of tasks that I need to complete:

  • Read the file fragment (associated with the IO disk)
  • Encrypt the specified fragment (CPU binding)
  • Download the specified fragment (binding to the IO network)
  • Repeat until the file is downloaded.

The problem is how to achieve this with maximum efficiency and productivity.

I tried using Parallel.For to encapsulate the entire operation unit, but I do not think this is the best way to approach this problem, given that each operation has different characteristics that can be taken into account (as I indicate in the list above).

After reading this article of the TPL article proposed in this question , and after looking at the empirical data in this question, I think that TPL is how I want. But how can I break it down for maximum efficiency and productivity? Should I even try multithreading the first two operations, given that loading is likely to be the bottleneck of the whole operation?

Thanks for your input.

Edit:

I tried using Tasks and ContinueWith to let the OS handle this, but I think I push another wall - when I wait for all my download tasks to complete, it looks like the garbage collector is t cleaning up the data I read for downloads, and as such, I'm running out of memory. Another issue to consider.

+4
source share
1 answer

If you could not use .Net 4.5, I would suggest you use one stream for reading from disk, one stream for encryption, and one stream for downloading. To communicate between them, you must use the producer-consumer pattern in the form of a BlockingCollection<byte[]> between each pair of threads (1-2 and 2-3).

But since you can use .Net 4.5, you can use TPL Dataflow, which is ideal for this task. Using TPL Dataflow means that you won’t spend threads reading and loading (although this most likely doesn’t matter much to you). More importantly, this means that you can easily parallelize the encryption of each fragment (if you can).

What would you do is have one block for encryption, one block for download, and one asynchronous task (in fact, it should not be a complete Task ) to read from a file. The block for encryption can be configured to run in parallel, and both blocks must be configured with maximum throughput (otherwise, throttling will not work correctly and the entire file will be read as quickly as possible, which can lead to an OutOfMemoryException).

In code:

 var uploadBlock = new ActionBlock<byte[]>( data => uploadStream.WriteAsync(data, 0, data.Length), new ExecutionDataflowBlockOptions { BoundedCapacity = capacity }); var encryptBlock = new TransformBlock<byte[], byte[]>( data => Encrypt(data), new ExecutionDataflowBlockOptions { BoundedCapacity = capacity, MaxDegreeOfParallelism = degreeOfParallelism }); encryptBlock.LinkTo( uploadBlock, new DataflowLinkOptions { PropagateCompletion = true }); while (true) { byte[] chunk = new byte[chunkSize]; int read = await fileStream.ReadAsync(chunk, 0, chunk.Length); if (read == 0) break; await encryptBlock.SendAsync(chunk); } fileStream.Close(); encryptBlock.Complete(); await uploadBlock.Completion; uploadStream.Close(); 
+1
source

All Articles