Limitation of work in parallel operations of a streaming resource

Question

Limitation of work in parallel operations of a streaming resource

I recently found the SemaphoreSlim class to limit the current operation of a parallelizable operation on a (large) stream resource:

 // The below code is an example of the structure of the code, there are some // omissions around handling of tasks that do not run to completion that should be in production code SemaphoreSlim semaphore = new SemaphoreSlim(Environment.ProcessorCount * someMagicNumber); foreach (var result in StreamResults()) { semaphore.Wait(); var task = DoWorkAsync(result).ContinueWith(t => semaphore.Release()); ... }

This must be avoided to bring too many results into memory, and the program cannot handle it (as a rule, this is described using the OutOfMemoryException exception). Although the code works and is quite efficient, it still feels awkward. It is noteworthy that the multiplier someMagicNumber , which, although configured through profiling, may not be as optimal as it could be, and is not resistant to changes in the implementation of DoWorkAsync .

Just as combining threads can overcome the obstacle to planning many things to do, I would like something that can overcome the obstacle to planning many things that need to be loaded into memory based on available resources.

Since it is deterministically impossible to decide whether an OutOfMemoryException will occur, I understand that what I'm looking for can only be achieved using statistical means or not at all, but I hope something is missing.

+4

concurrency yield-return c # -4.0 task-parallel-library

Rich o'kelly Jun 22 '12 at 15:27

source share

1 answer

Servy · Accepted Answer · 2012-06-22T15:45:31+0000

Here I would say that you probably overdid this problem. The consequences for overshooting are quite high (the program crashes). The consequences for being too low is that the program can slow down. As long as you still have a buffer outside the minimum value, a further increase to the buffer is usually ineffective if the processing time for this task in the pipe is unusually unstable.

If your buffer is constantly being filled, this usually means that the task before it in the pipe is performed a little faster than the task it performs, so even without a fairly small buffer, it will always ensure that the next task has some work. The size of the buffer needed to get 90% of the benefits of the buffer will usually be quite small (maybe a few dozen items), while the side needed to get the OOM error is larger than 6+ magnates. As long as you are somewhere between these two numbers (and this is a fairly large range to land), you'll be fine.

Just run your static tests, select a static number, maybe add a few percent just in case, and you should be fine. In the best case, I moved some magic numbers to the configuration file so that they could be changed without recompiling in case the original data or specifications of the machine changed radically.

Limitation of work in parallel operations of a streaming resource

More articles: