When I run nutch 1.10 with the following command, assuming TestCrawl2 did not exist before and should be created, ...
sudo -E bin/crawl -i -D solr.server.url=http://localhost:8983/solr/TestCrawlCore2 urls/ TestCrawl2/ 20
I get an error when indexing:
Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/opt/apache-nutch-1.10/TestCrawl2/linkdb/current
The linkdb directory exists, but does not contain the "current" directory. The directory is owned by root, so there should not be any permissions. Since the process exited the error, the linkdb directory contains the .locked and .. locked.crc . If I run the command again, these lock files will force it to exit in the same place. Remove TestCrawl2 directory, rinse, repeat.
Note that the nutch and solr installations themselves performed without problems in the TestCrawl instance. Just now, when I try to create a new one, I have problems. Any suggestions to fix this problem?
source share