b...">

Remove unnecessary paragraph tags from the line

If I have a line like:

<p>&nbsp;</p> <p></p> <p class="a"><br /></p> <p class="b">&nbsp;</p> <p>blah blah blah this is some real content</p> <p>&nbsp;</p> <p></p> <p class="a"><br /></p> 

How can I turn it into simple:

 <p>blah blah blah this is some real content</p> 

The regular expression should select &nbsp; and spaces.

+4
source share
3 answers
  $ result = preg_replace ('# <p [^>] *> (\ s | & nbsp;?) * </p> #', '', $ input); 

This does not output literal nbsp characters on output, but it is very rarely seen.

Since you are dealing with HTML, if this is user input, I can suggest using HTML Purifier, which will also take into account XSS vulnerabilities. The configuration you want to remove with empty tags is% AutoFormat.RemoveEmpty.

+15
source

This regex will work against your example:

  <p [^>] *> (?: \ s + | (?: & nbsp;) + | (?: <br \ s * /?>) +) * </p> 
+5
source

As the original responder stated, regex isn't the best solution here, what you want is some kind of hpml stripper.

Function on this site: http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page

To help you, you just need to manipulate the line a bit to get new lines and that does not return to the desired format.

+1
source

All Articles