Robos.txt parser java

I want to know how to parse robots.txt in java.

Is there any code?

+7
java parsing robots.txt
source share
3 answers

Heritrix is an open source web crawler written in Java. Looking through their javadoc, I see that they have a Robotstxt utility class for parsing a robots.txt file.

+5
source share

There is also a jrobotx library hosted at SourceForge.

(Full disclosure: I highlighted the code that forms this library.)

+1
source share

There is also a new version of crawler domains:

https://github.com/crawler-commons/crawler-commons

The library is designed to implement features common to any web crawler, and this includes a very convenient robots.txt parser

0
source share

All Articles