Hello
world

<...">

Remove pairs of blank tags from HTML snippet

I have a custom string containing HTML content such as

"<p></p><div></div><p>Hello<br/>world</p><p></p>" 

I would like to convert this string so that pairs of empty tags are deleted (but empty tags such as <br/> are retained). For example, the result of this conversion should convert the line above to

 "<p>Hello<br/>world</p>" 

I would like to use JSoup for this, as I already have this in my classpath, and it would be easier for me to do this server-side conversion.

+7
source share
5 answers

Here is an example that does just that (using JSoup):

 String html = "<p></p><div></div><p>Hello<br/>world</p><p></p>"; Document doc = Jsoup.parse(html); for (Element element : doc.select("*")) { if (!element.hasText() && element.isBlock()) { element.remove(); } } System.out.println(doc.body().html()) 

The code output above is what you are looking for:

 <p>Hello<br />world</p> 
+23
source

Not very familiar with jsoup, but you can do this with a simple regex:

 String html = "<p></p><div></div><p>Hello<br/>world</p><p></p>"; html = html.replaceAll("<([^>]*)></\\1>", ""); 

Although with a full parser, you could simply remove the empty content during processing, depending on what you are ultimately going to do with it.

+8
source

Jsoup will make the right XML out of custom HTML. Use an XML parser to find and remove all empty tags. I think this is a better idea than regular expression. See here: Java Remove Empty XML Tags You can also use JSoup to search for empty tags for you. See here: http://jsoup.org/cookbook/extracting-data/selector-syntax and use the Node.remove () method.

+1
source

If you use jquery, you can do it like

 var tags = "<p></p><div></div><p>Hello<br/>world</p><p></p>"; $("<div id='mydiv'>"+tags+"</div>").appendTo($('body')); $('#mydiv').children().each(function(){ var elem = $(this); if(elem.html() === "") elem.remove(); }); 

script: http://jsfiddle.net/LqCx5/2/

0
source

Don't know Jsoup, below code also works with simple javascript regular expression. try the code below.

 function removeall(){ var tagarray=new Array("<p>","<div>"); source="<p></p><div></div><p>Hello<br/>world</p><p></p>"; for ( var int = 0; int < tagarray.length; int++) { tag2=tagarray[int].replace("<","</"); var tagpair=new RegExp(tagarray[int]+tag2,"g"); source=source.replace(tagpair,""); } alert(source); 

}

-2
source

All Articles