Is there a library (for java) that compares the similarity between web pages (HTML, dom similarity)?
In my application, I want to classify links to a website. For example:
group 1: Product detail page
group 2: Category page(for online stores, etc.).
For such a classification, the html structure (dom) similarity is the best way, I think. Please help with this.
source
share