How to limit concurrent connections used by cURL

I created a simple web crawler using PHP (and cURL). It analyzes rougly 60,000 html pages and returns product information (this is an intranet tool).

My main problem is simultaneous connection. I would like to limit the number of connections, so no matter what happens, the crawler will never use more than 15 simultaneous connections.

The server blocks the IP address when the limit of 25 concurrent connections by IP address is reached, and for some reason I cannot change this on the server side, so I need to find a way to make my script never use more than X concurrent connections.

Is it possible?

Or maybe I should rewrite all this in another language?

Thanks, any help is appreciated!

+7
php libcurl web-crawler
source share
3 answers

you can use curl_set_opt(CURLOPT_MAXCONNECTS, 15); to limit the number of connections. But you can also make a simple connection manager if it doesn't do it for you.

+5
source share

Maybe write a simple join table:

 target_IP | active_connections 1.2.3.4 10 4.5.6.7 5 

each call to curL increased the number of connections, each of which would reduce it.

You can save the table in mySQL or Memcache table for speed.

When you encounter an IP address that already has its maximum connections, you will have to implement the "try again" queue.

0
source share

My answer to another question contains code for this with curl_multi _ *.

0
source share

All Articles