HttpClient.NET Disposal

I am developing a .NET4-based application that must request third-party servers to get information from them. I am using HttpClient to execute these HTTP requests.

I need to create one hundred or one thousand requests in a short period of time. I would like to limit the creation of this request to the limit (defined by a constant or something else) so that other servers do not receive many requests.

I checked this link , which shows how to reduce the number of tasks created at any time.

Here is my non-working approach:

// create the factory var factory = new TaskFactory(new LimitedConcurrencyLevelTaskScheduler(level)); // use the factory to create a new task that will create the request to the third-party server var task = factory.StartNew(() => { return new HttpClient().GetAsync(url); }).Unwrap(); 

Of course, the problem is that even this one task is being created at that time, many requests will be created and processed at the same time, since they are launched in another scheduler. I could not find a way to change the scheduler to HttpClient.

How should I deal with this situation? I would like to limit the number of requests created to a certain limit, but not to block waiting for this request to complete.

Is it possible? Any ideas?

+6
source share
4 answers

If you can use .Net 4.5, one way would be to use a TransformBlock from the TPL data stream and set it to MaxDegreeOfParallelism . Sort of:

 var block = new TransformBlock<string, byte[]>( url => new HttpClient().GetByteArrayAsync(url), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = level }); foreach (var url in urls) block.Post(url); block.Complete(); var result = new List<byte[]>(); while (await block.OutputAvailableAsync()) result.Add(block.Receive()); 

There is another way to take a look at this, ServicePointManager . Using this class, you can set limits on MaxServicePoints (how many servers you can connect at a time) and DefaultConnectionLimit (how many connections can be on each server). Thus, you can start all your Task at the same time, but only a limited number of them will really do something. Although limiting the number of Task (for example, using a TPL data stream, as I suggested above), will most likely be more efficient.

+1
source

You might consider creating a new DelegatingHandler to work in the HTTPClient request / response pipeline, which could count the number of pending requests.

Typically, a single HTTPClient is used to process multiple requests. Unlike HttpWebRequest, deleting an HttpClient instance closes the underlying TCP / IP connection, so if you want to reuse the connections, you really need to reuse the HTTPClient instances.

+1
source

First, you should consider sharing workloads according to the website, or at least expose an abstraction that allows you to choose how to split the list of URLs. for example, one strategy may be at the second level, for example. yahoo.com, google.com.

Another thing is that if you are doing a serious scan, you may want to do this instead of the cloud. Thus, each node in the cloud can scan a different partition. When you say โ€œshort period of time,โ€ you are already setting yourself up for failure. You need hard numbers on what you want to achieve.

Another key benefit of partitioning is that you can also avoid hitting servers at peak hours and risk IP bans at the router level if the site doesnโ€™t just throttle you.

0
source

You might consider starting a fixed set of threads. Each thread performs network operations of the client in turn; possibly also stops at certain points to throttle. This will give you specific control over the download; You can change the throttle policy and change the number of threads.

0
source

All Articles