Regexp to clear empty unnecessary HTML tags

I use TinyMCE (WYSIWYG) as the default editor in one of my projects, and sometimes it automatically adds <p>&nbsp;</p> , <p> </p> or divs.

I searched but could not find a good way to clear any empty tags using regular expressions.

The code I tried to use is

 $pattern = "/<[^\/>]*>([\s]?)*<\/[^>]*>/"; $str = preg_replace($pattern, '', $str); 

Note. I also want to clear & nbsp :(

+4
source share
5 answers

Try /<(\w+)>(\s|&nbsp;)*<\/\1>/ instead. :)

+6
source

This regex is a bit weird - but it looks like it might work. Instead, you can try:

 $pattern = ':<[^/>]*>\s*</[^>]*>:'; $str = preg_replace($pattern, '', $str); 

Is very similar.

+1
source

I know that this is not the way you asked, but after several months of TinyMCE, not only dealing with this, but with the hell that occurs to users sent directly from Word, I made the transition to FCKeditor and could not be happier.

EDIT: Just in case, this is unclear, I say that FCKeditor does not insert arbitrary parashes where it looks like it, plus it copes with the inserted Word crap out of the box. You can find my previous question to help.

0
source

You need several Regexes to be sure that you have not excluded other elements you need with one thing in common.

As Ben said, you can discard valid elements with one common regex

 <\s*[^>]*>\s*`&nbsp;`\s*<\s*[^>]*> <\s*p\s*>\s*<\s*/p\s*> <\s*div\s*>\s*<\s*/div\s*> 
0
source

Try the following:

 <([\w]+)[^>]*?>(\s|&nbsp;)*<\/\1> 
0
source

All Articles