Removing text enclosed between HTML tags using JSoup

In some cases of HTML cleanup, I would like to keep the text enclosed between the tags (which is Jsoup's default behavior), and in some cases I would like to remove the text as well as the HTML tags. Can someone please shed some light on how I can remove the text enclosed between HTML tags using Jsoup?

+5
html text extract jsoup
source share
2 answers

Cleaner will always remove tags and save text. If you need to drop elements (i.e. Tags and text / nested elements), you can pre-analyze the HTML, remove the elements using remove() or empty() , then run the resulting one through a cleaner.

For example:

 String html = "Clean <div>Text dropped</div>"; Document doc = Jsoup.parse(html); doc.select("div").remove(); // if not removed, the cleaner will drop the <div> but leave the inner text String clean = Jsoup.clean(doc.body().html(), Whitelist.basic()); 
+10
source share
 1. String html = "<!DOCTYPE html><html><head><title></title></head><body><p>hello there</p></body></html>"; 2. Document d = Jsoup.parse(html); 3. System.out.println(d); 4. System.out.println("************************************************"); 5. d.getElementsByTag("p").remove(); 6. System.out.println(d); 

while you get with Elements, you get some problems, you can do this action on the Document d object. which will work for sure.

0
source share

All Articles