How can I navigate the HTML tree using Jsoup?

I think this question is asked, but I did not find anything.

From a Document element in Jsoup, how can I navigate for all elements of HTML content?

I read the documentation and I thought about using the childNodes() method, but it takes only nodes from one left below (which I understand). I think I can use some recursion with this method, but I want to know if there is a more suitable / native way to do this.

+7
source share
3 answers

From Document (and any Node ), you can use the traverse(NodeVisitor) method.

For example:

 document.traverse(new NodeVisitor() { public void head(Node node, int depth) { System.out.println("Entering tag: " + node.nodeName()); } public void tail(Node node, int depth) { System.out.println("Exiting tag: " + node.nodeName()); } }); 
+18
source

1) You can select all elements of the document using the * selector.

 Elements elements = document.body().select("*"); 

2) To obtain the text of each individually using the Element.ownText () method.

 for (Element element : elements) { System.out.println(element.ownText()); } 

3) To change the text of each separately using Element.html (String strHtml). (Cleans up any existing internal HTML in the element and replaces it with parsed HTML.)

 element.html(strHtml); 

Hope this helps you. Thanks!

0
source

You can use the following code:

 public class JsoupDepthFirst { private static String htmlTags(Document doc) { StringBuilder sb = new StringBuilder(); htmlTags(doc.children(), sb); return sb.toString(); } private static void htmlTags(Elements elements, StringBuilder sb) { for(Element el:elements) { if(sb.length() > 0){ sb.append(","); } sb.append(el.nodeName()); htmlTags(el.children(), sb); sb.append(",").append(el.nodeName()); } } public static void main(String... args){ String s = "<html><head>this is head </head><body>this is body</body></html>"; Document doc = Jsoup.parse(s); System.out.println(htmlTags(doc)); } } 
-one
source

All Articles