It is not clear what you mean by the finder instances. If you want to run script traversal several times in parallel, for example. you have great workarounds with separate configurations, seeds, etc ... then they will compete for slots in the Hadoop cluster. Then it will come down to how many card / gear slots are available on your cluster, which in itself depends on the number of slaves.
Processing multiple Nutch assemblies in parallel can be very complex and resource inefficient. Instead, think about your architecture so that all logical scanners can work as one physical or look at the StormCrawler , which should be better suited for this.
Julien Nioche
source share