Is there a way to execute the method several times, but manage connections / threads? (.NET)

Question

Is there a way to execute the method several times, but manage connections / threads? (.NET)

I have a method that uses a connection (e.g. a method that loads a page).
I need to execute this method several times (e.g. load 1000 pages).
This synchronous and sequential use takes a long time to complete.
I have limited resources (8 maximum threads and / or 50 concurrent connections)
I want to use all the resources to speed it up.
I know that parallelization (PLINQ, Parallel Extensions, etc.) can solve the problem, but I already tried it, and this approach fails due to scarce resources.
I don’t want to invent a wheel that parallelizes this task when managing resources, someone had to do it earlier and had to provide a library / tutorial for this.

Can anyone help?

Update . Things get a lot more complicated when you start mixing asynchronous calls with parallelization for maximum performance. This is implemented on several downloaders, such as the Firefox downloader, and simultaneously receives 2 downloads, and when one of them is completed, it receives the next file and so on. It may seem very simple to implement, but when I implemented it, I still had a problem making it common (useful for WebRequest and DbCommand) and solving problems (i.e. timeouts)

Bounty hunters . A bounty will be provided first, which links the robust and free ($$) .NET library, which provides an easy C # way to parallelize asynchronous tasks like HttpWebRequests.BegingetResponse and SqlCommand.BeginExecuteNonQuery. Parallelization should not wait for the completion of N tasks, and then start the next N, but it should start a new task as soon as one of the N initial tasks ends. This method should provide timeout processing.

+2

multithreading parallel-processing .net connection

Jader dias Jan 27 '09 at 18:03

source share

11 answers

Look at the counting semaphore for joins. http://en.wikipedia.org/wiki/Semaphore_(programming)

EDIT: There is already one for responding to your comment in the .NET Framework. http://msdn.microsoft.com/en-us/library/system.threading.semaphore.aspx

+5

toad Jan 27 '09 at 18:15

source share

See CCR . This one is the right way to do this, although you can see for a bit that the libraries study the curve a bit ...

+4

Matt davison Feb 01 '09 at 14:23

source share

You can use the .NET class System.Threading.ThreadPool . You can set the maximum number of threads to activate at any time using ThreadPool.SetMaxThreads() .

+3

Kev Jan 27 '09 at 18:09

source share

Here's what I won't get: you say max 50 connections, but only 8 threads. Each connection, by definition, "occupies" / works in a stream. I mean, you don’t use DMA or any other magic to take the load off the CPU, so each transfer requires an execution context. If you can run 50 asynchronous requests at the same time, fine, do it - you can run them from a single thread, since calling the async read function takes virtually no time. If you, for example, have 8 cores and want to make sure that the entire core is allocated for each transfer (which will probably be dumb, but this is your code, so ...), you can start 8 transfers at once.

My suggestion is to simply run 50 asynchronous requests inside the synchronization block so that they all start before you allow them to be filled (simplifies the math). Then use the count semaphore suggested by Jeremy or the synchronized queue, as suggested by mbeckish, to keep track of the remaining work. At the end of your asynchronous callback, run the following connection (if necessary). That is, start 50 connections, then, when done, use the "completed" event handler to start the next one until all work has been completed. This should not require any additional libraries or frameworks.

+3

Coderer Feb 04 '09 at 16:12

source share

I highly recommend staying away from threadpool, except for very short tasks. If you decide to use a semaphore, make sure that you only block the code that is in the queue for work items and not at the beginning of the work item code, or you will quickly block the thread pool if your (max count * 2 semaphores) are greater than the maximum pool threads.

In practice, you really can never safely get a lock on a thread in a pool and you cannot safely make calls for most asynchronous APIs (or synchronize APIs like HttpWebRequest.GetResponse, because it also performs async operations under its covers in the thread pool).

+2

Matt davison Jan 27 '09 at 19:54

source share

Create a data structure to keep track of which pages have been retrieved and what else needs to be retrieved. e.g. queue
Using the Producer / Consumer Queue template, send 8 consumer streams to make selections. This way, you know that you will never exceed the 8 thread limit.

See here for a good example.

+2

mbeckish Jan 27 '09 at 20:12

source share

Jeffrey Richter has a Power Threading Library that can help you. His push is filled with patterns and quite powerful. I could not find a quick example with connections, but there are many examples that may work for you regarding the coordination of several asynchronous operations.

It can be downloaded here and there are several articles and samples here . In addition, this link contains a detailed Jeffrey article explaining concurrent asynchronous operations.

+2

Sailing judo Feb 01 '09 at 14:20

source share

Async WebRequest methods may display slugggish as they are blocked when performing a DNS lookup, and then switch to asynchronous behavior. Following this path, it seems ineffective to deploy eight threads to submit requests to an API that already spins threads to do the bulk of the work. You may want to rethink some of your approaches to address this drawback of the asynchronous WebRequest API. Our solution ultimately involved the use of a synchronous API, each of which was included in its own thread. I would be interested in anyone who commented on the correctness of this approach.

+1

spender Feb 01 '09 at 15:17

source share

So you can do this with the base class library in .net 3.5: Calling SetMinThreads is optional - see What happens to it without it.

You should handle timeouts in your DoSomethingThatsSlow replacement

 public class ThrottledParallelRunnerTest { public static void Main() { //since the process is just starting up, we need to boost this ThreadPool.SetMinThreads(10, 10); IEnumerable<string> args = from i in Enumerable.Range(1, 100) select "task #" + i; ThrottledParallelRun(DoSomethingThatsSlow, args, 8); } public static void DoSomethingThatsSlow(string urlOrWhatever) { Console.Out.WriteLine("{1}: began {0}", urlOrWhatever, DateTime.Now.Ticks); Thread.Sleep(500); Console.Out.WriteLine("{1}: ended {0}", urlOrWhatever, DateTime.Now.Ticks); } private static void ThrottledParallelRun<T>(Action<T> action, IEnumerable<T> args, int maxThreads) { //this thing looks after the throttling Semaphore semaphore = new Semaphore(maxThreads, maxThreads); //wrap the action in a try/finally that releases the semaphore Action<T> releasingAction = a => { try { action(a); } finally { semaphore.Release(); } }; //store all the IAsyncResult - will help prevent method from returning before completion List<IAsyncResult> results = new List<IAsyncResult>(); foreach (T a in args) { semaphore.WaitOne(); results.Add(releasingAction.BeginInvoke(a, null, null)); } //now let make sure everything returned. Maybe collate exceptions here? foreach (IAsyncResult result in results) { releasingAction.EndInvoke(result); } } }

+1

Rob Fonseca-Ensor Feb 06 '09 at 8:03

source share

You should take a look at asynchronous F # workflows.

You really don't want your code to be parallel but asynchronous

asynchronous refers to programs that perform some lengthy operations that do not need to block a call flow, for example, accessing a network, calling web services, or performing any other I / O operation in General

This is a very interesting article about this concept, explained using C # iterators.

This is a great book on F # and asynchronous programming.

The learning curve is very poor (a lot of odd stuff: F # syntax, type Async <a>, monads, etc.), but it is a VERY powerful approach and can be used in real life with excellent C # interop.

The basic idea here is to continue: while your efforts for some I / O operations allow your threads to do something else!

+1

Luca martinetti Feb 07 '09 at 3:22

source share

Chaowlert chaisrichalermpol · Accepted Answer · 2009-02-01T14:22:02+0000

Can you give more information why Parallel Linq will not work?

My point is, your task is best suited for PLinq. If you run a machine with 8 cores, PLinq will split into 8 tasks and queue all the other tasks for you.

Here is the project code,

PagesToDownload.AsParallel().ForAll(DownloadMethodWithLimitConnections);

I do not understand why PLinq consumes your resources. Based on my test, PLinq performance is even better than using ThreadPool.

Is there a way to execute the method several times, but manage connections / threads? (.NET)

More articles: