First of all, I would like to note that this is not the main task that you can do for any hosting provider. I guess you will be banned.
So, I assume that you can compile the software (VPS?) And run lengthy processes in the background (using php cli ). I would use redis (I liked predis as a PHP client library) in push messages on the list. ( PS: I would prefer to write this in node.js / python (the explanation below works for PHP), because I think this task can be encoded in these languages ββpretty quickly. I'm going to try writing it and the post code on github later . )
Redis:
Redis is an extended keystore. It is similar to memcached, but the data set is not mutable, and the values ββcan be strings, just like memcached, but also lists, sets, and ordered sets. All these data types can manipulate atomic operations to push / pop elements, add / remove elements, make a server-side connection, intersection, difference between sets, and so on. Redis supports a variety of sorting options.
Then start several workflows that take (block if they are not available) messages from the list.
Blpop:
Here Redis is really interesting. BLPOP and BRPOP are blocking equivalents of LPOP and RPOP. If the queue for any of the keys that they specify has an element in it, that element will slip out and return. If this is not the case, the Redis client will be blocked until the key is available (or the timeout expires), specify 0 for an unlimited timeout).
Curl is not exactly ping (ICMP Echo), but I think some servers can block these requests (security). First, I would try ping (using part of the nmap fragment), and fail to curl if ping failed because pinging is faster than using curl.
Libcurl:
Free client-side URL transfer, FTP, FTPS, Gopher (protocol), HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, FILE, LDAP, LDAPS, IMAP, POP3, SMTP and RTSP support (last four - only in versions newer than 7.20.0 or February 9, 2010)
Ping:
Ping is a computer network administration utility used to test the reachability of a host on the Internet (IP) and measure the round-trip time for messages sent from the host to the target computer. The name comes from active sonar terminology. Ping works by sending Internet Control Message Protocol (ICMP) pings to the target host and waiting for an ICMP Response.
But then you should execute the HEAD request and get only the headers to check if the host is up. Otherwise, you will also download the contents of the URL (time / cost bandwidth required).
HEAD:
The HEAD method is identical to the GET method except that the server SHOULD NOT return the message body in the response. the meta information contained in the HTTP headers in response to the HEAD request MUST be identical to the information sent in response to the GET request. This method can be used to obtain meta-information about an object implied by a request without transmitting the essential body itself. This method is often used to test hypertext links for validity, accessibility, and recent modification.
Then each workflow should use curl_multi. I think this link can provide a good implementation of this (minus that it does not make a request on the head). to have some kind of concurrency in every process.