Remove all classes from p tags

I'm just wondering if anyone knows the function to remove ALL classes from a string in php. I basically only want

<p> 

but not

 <p class="..."> 

If that makes sense :)

+6
php class strip
source share
6 answers

A fairly naive regex will probably work for you

 $html=preg_replace('/class=".*?"/', '', $html); 

I say naively because it will fail if your body text contains class = "something" for some reason !. It could be made a little more reliable if you were looking for class = "" inside angle brackets, if necessary.

+8
source share

It may be outsmarted for your needs, but for analyzing / checking / cleaning HTML data, the best tool I know is an HTML cleaner

It allows you to determine which tags and which attributes are in order; and / or which are not; and it gives valid / pure (X) HTML as output.

(Using regular expressions to parse HTML seems OK at the beginning ... And then when you want to add certain things, it usually becomes hell to understand / maintain)

+2
source share

You load the HTML into the DOMDocument class, load it into simpleXML. Then you execute an XPath query for all p-elements and then look through them. In each loop, you rename the class attribute to something like "killmeplease."

When this is done, repeat the simpleXML procedure as XML (which, by the way, can change the HTML, but usually only for the better), and you will have an HTML string in which each p has a class of "killmeplease". Use str_replace to remove them.

Example:

 $html_file = "somehtmlfile.html"; $dom = new DOMDocument(); $dom->loadHTMLFile($html_file); $xml = simplexml_import_dom($dom); $paragraphs = $xml->xpath("//p"); foreach($paragraphs as $paragraph) { $paragraph['class'] = "killmeplease"; } $new_html = $xml->asXML(); $better_html = str_replace('class="killmeplease"', "", $new_html); 

Or if you want to make the code simpler but get confused with preg_replace, you can go with:

 $html_file = "somehtmlfile.html"; $html_string = file_get_contents($html_file); $bad_p_class = "/(<p ).*(class=.*)(\s.*>)/"; $better_html = preg_replace($bad_p_class, '$1 $3', $html_string); 

The hard part with regular expressions is that they tend to be greedy, and trying to disable this can cause problems if your p element tag has a line break in it. But give one of them a shot.

+2
source share
 $html = "<p id='fine' class='r3e1 b4d 1' style='widows: inherit;'>"; preg_replace('/\sclass=[\'|"][^\'"]+[\'|"]/', '', $html); 

If you are testing with the HTML version of Microsoft Office, you will need more than deleting the class, but HTML Tidy has a configuration flag for Microsoft Office only!

Otherwise, it should be safer than some of the other answers, given that they are a bit greedy, and you don't know what encapsulation will be used ( ' or " ).

Note: The pattern is actually /\sclass=['|"][^'"]+['|"]/ , but since there are both inverted commas ( " ) apostrophes ( ' ), I had to avoid all occurrences of one ( \' ) to encapsulate the pattern.

+2
source share

I would do something similar in jQuery. Put this in your page title:

 $(document).ready(function(){ $(p).each(function(){ $(this).removeAttr("class"); //or $(this).removeclass("className"); }) 

});

+1
source share

HTML cleaner

HTML can be very complex for regular expression due to hundreds of different ways of code that can be written or formatted.

HTML Cleaner is a mature open source library for cleaning HTML. I would advise using it in this case.

In the documentation for setting up the HTML cleaner, you can specify the classes and attributes that should be allowed, and what the cleaner should do if it finds them.

http://htmlpurifier.org/docs/

+1
source share

All Articles