I have a crawler that scans several different domains for new messages / content. The total amount of content is one hundred thousand pages, and a lot of new content is added every day. Therefore, to crawl all this content, I need my crawler to browse 24 hours a day.
Currently, I host the script crawler on the same server as the site to which the crawler adds content, and I can run cronjob to run the script at night, because when I do this, the website basically stops working. because load script. In other words, a pretty crappy solution.
So basically I'm wondering what is my best option for such a solution?
Is it possible to run the scanner from the same host, but somehow balance the load so that the script does not kill the website?
What host / server should I look for to start the crawler? Are there any other specs I need than regular web hosting?
The crawler saves the images that it scans. If I host my crawler on a secondary server, how do I save my images on the server of my site? I assume that I do not want CHMOD 777 in my uploads folder and allow anyone to host files on my server.
performance webserver web-crawler hosting
Marcus lind
source share