Regex to remove empty html tags containing only empty child elements

I need to parse an HTML string and remove all elements containing only empty child elements.

Example:

<P ALIGN="left"><FONT FACE="Arial" SIZE="12" COLOR="#000000" LETTERSPACING="0" KERNING="1"><B></B></FONT></P> 

does not contain information and must be replaced by </br>

I wrote a regular expression like this:

 <\w+\b[^>]*>(<\w+\b[^>]*>\s*</\w*\s*>)*</\w*\s*> 

but the problem is that he catches only 2 levels out of three. In the abobe example, the <p> element (external external) is not selected.

Can you help me fix this regex?

+1
source share
2 answers

This regex works:

 /(<(?!\/)[^>]+>)+(<\/[^>]+>)+/ 

See a live demo with your example.

+2
source

Use jQuery and parse all the children. For each child, you should check if .html () is empty. If yes β†’ delete the current item (or parent, if you want) with .remove ().

Do for each line:

 var appended = $('.yourparent').append('YOUR HTML STRING'); appended.children().each(function () { if(this.html() === '') { this.parent().remove(); } }); 

This will add items first and delete if there are empty children.

+2
source

All Articles