I want to know how to parse robots.txt in java.
Is there any code?
Heritrix is an open source web crawler written in Java. Looking through their javadoc, I see that they have a Robotstxt utility class for parsing a robots.txt file.
There is also a jrobotx library hosted at SourceForge.
(Full disclosure: I highlighted the code that forms this library.)
There is also a new version of crawler domains:
https://github.com/crawler-commons/crawler-commons
The library is designed to implement features common to any web crawler, and this includes a very convenient robots.txt parser