Java. Compare the structure of web pages (dom).

Is there a library (for java) that compares the similarity between web pages (HTML, dom similarity)?

In my application, I want to classify links to a website. For example: group 1: Product detail page group 2: Category page(for online stores, etc.).

For such a classification, the html structure (dom) similarity is the best way, I think. Please help with this.

+5
source share
1 answer

Not quite what you are asking for, but if HTMl is valid XML, you can use XMLUnit , very simply , to compare the similarities with it.

0
source

All Articles