How to limit concurrent connections used by cURL

Question

How to limit concurrent connections used by cURL

I created a simple web crawler using PHP (and cURL). It analyzes rougly 60,000 html pages and returns product information (this is an intranet tool).

My main problem is simultaneous connection. I would like to limit the number of connections, so no matter what happens, the crawler will never use more than 15 simultaneous connections.

The server blocks the IP address when the limit of 25 concurrent connections by IP address is reached, and for some reason I cannot change this on the server side, so I need to find a way to make my script never use more than X concurrent connections.

Is it possible?

Or maybe I should rewrite all this in another language?

Thanks, any help is appreciated!

+7

php libcurl web-crawler

josephdotca Feb 09 '10 at 21:29

source share

3 answers

Maybe write a simple join table:

 target_IP | active_connections 1.2.3.4 10 4.5.6.7 5

each call to curL increased the number of connections, each of which would reduce it.

You can save the table in mySQL or Memcache table for speed.

When you encounter an IP address that already has its maximum connections, you will have to implement the "try again" queue.

0

Pekka 웃 Feb 09 '10 at 21:45

source share

My answer to another question contains code for this with curl_multi _ *.

0

Gzipp Feb 09 '10 at 23:21

source share

prodigitalson · Accepted Answer · 2010-02-09T21:38:16+0000

you can use curl_set_opt(CURLOPT_MAXCONNECTS, 15); to limit the number of connections. But you can also make a simple connection manager if it doesn't do it for you.

How to limit concurrent connections used by cURL

More articles: