For a comparison between Nutch and StormCrawler, see my dzone article .
Heritrix can be used in distributed mode, but the documentation is not very clear how to do this. The previous 2 rely on well-established computing distribution platforms (Apache Hadoop and Apache Storm respectively), but this does not apply to Heritrix.
Heritrix is also used mainly by the archiving community, while Nutch and StormCrawler are used for wider use (for example, for indexing, curettage) and have more resources for data extraction.
I am not familiar with the 2 hosted services that you mentioned, since I only use open source software.
Julien Nioche
source share