I need to parse an HTML string and remove all elements containing only empty child elements.
Example:
<P ALIGN="left"><FONT FACE="Arial" SIZE="12" COLOR="#000000" LETTERSPACING="0" KERNING="1"><B></B></FONT></P>
does not contain information and must be replaced by </br>
I wrote a regular expression like this:
<\w+\b[^>]*>(<\w+\b[^>]*>\s*</\w*\s*>)*</\w*\s*>
but the problem is that he catches only 2 levels out of three. In the abobe example, the <p> element (external external) is not selected.
Can you help me fix this regex?
source share