Remove pairs of blank tags from HTML snippet
I have a custom string containing HTML content such as
"<p></p><div></div><p>Hello<br/>world</p><p></p>"
I would like to convert this string so that pairs of empty tags are deleted (but empty tags such as <br/>
are retained). For example, the result of this conversion should convert the line above to
"<p>Hello<br/>world</p>"
I would like to use JSoup for this, as I already have this in my classpath, and it would be easier for me to do this server-side conversion.
Here is an example that does just that (using JSoup):
String html = "<p></p><div></div><p>Hello<br/>world</p><p></p>"; Document doc = Jsoup.parse(html); for (Element element : doc.select("*")) { if (!element.hasText() && element.isBlock()) { element.remove(); } } System.out.println(doc.body().html())
The code output above is what you are looking for:
<p>Hello<br />world</p>
Not very familiar with jsoup, but you can do this with a simple regex:
String html = "<p></p><div></div><p>Hello<br/>world</p><p></p>"; html = html.replaceAll("<([^>]*)></\\1>", "");
Although with a full parser, you could simply remove the empty content during processing, depending on what you are ultimately going to do with it.
Jsoup will make the right XML out of custom HTML. Use an XML parser to find and remove all empty tags. I think this is a better idea than regular expression. See here: Java Remove Empty XML Tags You can also use JSoup to search for empty tags for you. See here: http://jsoup.org/cookbook/extracting-data/selector-syntax and use the Node.remove () method.
If you use jquery, you can do it like
var tags = "<p></p><div></div><p>Hello<br/>world</p><p></p>"; $("<div id='mydiv'>"+tags+"</div>").appendTo($('body')); $('#mydiv').children().each(function(){ var elem = $(this); if(elem.html() === "") elem.remove(); });
script: http://jsfiddle.net/LqCx5/2/
Don't know Jsoup, below code also works with simple javascript regular expression. try the code below.
function removeall(){ var tagarray=new Array("<p>","<div>"); source="<p></p><div></div><p>Hello<br/>world</p><p></p>"; for ( var int = 0; int < tagarray.length; int++) { tag2=tagarray[int].replace("<","</"); var tagpair=new RegExp(tagarray[int]+tag2,"g"); source=source.replace(tagpair,""); } alert(source);
}