I created a simple web crawler using PHP (and cURL). It analyzes rougly 60,000 html pages and returns product information (this is an intranet tool).
My main problem is simultaneous connection. I would like to limit the number of connections, so no matter what happens, the crawler will never use more than 15 simultaneous connections.
The server blocks the IP address when the limit of 25 concurrent connections by IP address is reached, and for some reason I cannot change this on the server side, so I need to find a way to make my script never use more than X concurrent connections.
Is it possible?
Or maybe I should rewrite all this in another language?
Thanks, any help is appreciated!
php libcurl web-crawler
josephdotca
source share