Maximum working Apache Nutch instances

Question

Maximum working Apache Nutch instances

What is the maximum number of Apache Nutch crawler instances that can work simultaneously with a single node wizard?

+7

hadoop nutch

Sanaz marshall Dec 17 '15 at 2:39

source share

1 answer

Julien Nioche · Answer 1 · 2015-12-24T08:51:00+0000

It is not clear what you mean by the finder instances. If you want to run script traversal several times in parallel, for example. you have great workarounds with separate configurations, seeds, etc ... then they will compete for slots in the Hadoop cluster. Then it will come down to how many card / gear slots are available on your cluster, which in itself depends on the number of slaves.

Processing multiple Nutch assemblies in parallel can be very complex and resource inefficient. Instead, think about your architecture so that all logical scanners can work as one physical or look at the StormCrawler , which should be better suited for this.

Maximum working Apache Nutch instances

More articles: