Jsoup efficient way to remove html elements and

I want to remove the html div and table tags and something inside it (childs), what is the best way to do this?

I tried crawling a document like this, but it doesn't work, the Jsoup documentation says that node.remove() removes the element from the DOM and its children:

 doc.traverse(new NodeVisitor() { @Override public void head(Node node, int i) { } @Override public void tail(Node node, int i) { //Log.i(TAG,"node: "+node.nodeName()); if( node.nodeName().compareTo("table") == 0 || node.nodeName().compareTo("div") == 0 ) node.remove(); } }); 
+7
java html jsoup
source share
2 answers

Have you tried the remove() function of the Elements class?

 Document doc = Jsoup.parse(html); doc.select("div").remove(); doc.select("table").remove(); 

This should select and remove all the <div> and <table> elements.

+14
source share
 Document doc = Jsoup.parse(html); doc.select("table *").remove(); 
0
source share

All Articles