Strip_tags disable some tags

Question

Strip_tags disable some tags

Based on the strip_tags document strip_tags second parameter accepts valid tags. However, in my case, I want to do the opposite. Suppose I accept tags that script_tags normally (by default) accept but only share the <script> . Any possible way to do this?

I do not mean someone to code it for me, but rather the way of possible ways to achieve this (if possible) is much appreciated.

+6

html php strip-tags

Leandro garcia Sep 11 '12 at 3:39

source share

5 answers

First, see what others have said on this topic:

Strip <script> tags and everything between PHP?

and

remove script tag from HTML content

It seems you have 2 options, one of them is a Regex solution, and both of the links above give them. The second way is to use an HTML cleaner .

If you drop the script tag for some other reason than sanitizing custom content, Regex might be a good solution. However, as everyone warned, it is recommended that you use an HTML cleaner if you sanitize the input.

+2

Todd moses Sep 11 '12 at 4:12

source share

PHP solution (5 or more):

If you want to remove the <script> tags (or any others) and also want to remove the contents inside the tags , you should use:

OPTION 1 (simplest):

 preg_replace('#<script(.*?)>(.*?)</script>#is', '', $text);

OPTION 2 (more universal):

 <?php $html = "<p>Your HTML code</p><script>With malicious code</script>" $dom = new DOMDocument(); $dom->loadHTML($html); $script = $dom->getElementsByTagName('script'); $remove = []; foreach($script as $item) { $item->parentNode->removeChild($item); } $html = $dom->saveHTML();

Then $html will be:

 "<p>Your HTML code</p>"

+1

Villapalos Apr 12 '16 at 15:51

source share

This is what I use to cut out the list of forbidden tags, you can do either delete tags that wrap content, or tags, including content, plus trim the remaining empty space.

 $description = trim(preg_replace([ # Strip tags around content '/\<(.*)doctype(.*)\>/i', '/\<(.*)html(.*)\>/i', '/\<(.*)head(.*)\>/i', '/\<(.*)body(.*)\>/i', # Strip tags and content inside '/\<(.*)script(.*)\>(.*)<\/script>/i', ], '', $description));

Input Example:

 $description = '<html> <head> </head> <body> <p>This distinctive Mini Chopper with Desire styling has a powerful wattage and high capacity which makes it a very versatile kitchen accessory. It also comes equipped with a durable glass bowl and lid for easy storage.</p> <script type="application/javascript">alert('Hello world');</script> </body> </html>';

Output result:

 <p>This distinctive Mini Chopper with Desire styling has a powerful wattage and high capacity which makes it a very versatile kitchen accessory. It also comes equipped with a durable glass bowl and lid for easy storage.</p>

0

Marc newton Mar 15 '16 at 10:35

source share

I am using the following:

 function strip_tags_with_forbidden_tags($input, $forbidden_tags) { foreach (explode(',', $forbidden_tags) as $tag) { $tag = preg_replace(array('/^</', '/>$/'), array('', ''), $tag); $input = preg_replace(sprintf('/<%s[^>]*>([^<]+)<\/%s>/', $tag, $tag), '$1', $input); } return $input; }

Then you can do:

 echo strip_tags_with_forbidden_tags('<cancel>abc</cancel>xpto<p>def></p><g>xyz</g><t>xpto</t>', 'cancel,g');

Output: 'abcxpto<p>def></p>xyz<t>xpto</t>'

 echo strip_tags_with_forbidden_tags('<cancel>abc</cancel> xpto <p>def></p> <g>xyz</g> <t>xpto</t>', 'cancel,g');

Outputs: 'abc xpto <p>def></p> xyz <t>xpto</t>'

0

Amadu bah Apr 27 '16 at 15:53

source share

Jared farrish · Accepted Answer · 2012-09-11T06:39:27+0000

EDIT

To use the HTML purifier HTML.ForbiddenElements config directive, it looks like you would do something like:

 require_once '/path/to/HTMLPurifier.auto.php'; $config = HTMLPurifier_Config::createDefault(); $config->set('HTML.ForbiddenElements', array('script','style','applet')); $purifier = new HTMLPurifier($config); $clean_html = $purifier->purify($dirty_html);

http://htmlpurifier.org/docs

HTML.ForbiddenElements should be set to array . I do not know what form the array members should take:

 array('script','style','applet')

Or:

 array('<script>','<style>','<applet>')

Or something else?

I think this is the first form, without delimiters; HTML.AllowedElements uses the configuration line form, which is somewhat common with TinyMCE valid elements syntax :

 tinyMCE.init({ ... valid_elements : "a[href|target=_blank],strong/b,div[align],br", ... });

So, I think this is just a term, and no attributes should be provided (since you are forbidding the element ... although there is HTML.ForbiddenAttributes , too). But this is an assumption.

I will add this note from the HTML.ForbiddenAttributes :

Warning: This directive complements %HTML.ForbiddenElements , so read this directive to discuss why you should think twice before using this directive.

Blacklisting is simply not as “reliable” as whitelisting, but you may have your own reasons. Just beware and be careful.

Without testing, I'm not sure what to tell you. I will continue to search for an answer, but I will most likely go to bed first. It is too late. :)

Although I think you really should use HTML Purifier and use its HTML.ForbiddenElements , I think that a reasonable alternative, if you really want to use strip_tags() , is to get a whitelist from the blacklist. In other words, delete what you do not want, and then use what is left.

For instance:

 function blacklistElements($blacklisted = '', &$errors = array()) { if ((string)$blacklisted == '') { $errors[] = 'Empty string.'; return array(); } $html5 = array( "<menu>","<command>","<summary>","<details>","<meter>","<progress>", "<output>","<keygen>","<textarea>","<option>","<optgroup>","<datalist>", "<select>","<button>","<input>","<label>","<legend>","<fieldset>","<form>", "<th>","<td>","<tr>","<tfoot>","<thead>","<tbody>","<col>","<colgroup>", "<caption>","<table>","<math>","<svg>","<area>","<map>","<canvas>","<track>", "<source>","<audio>","<video>","<param>","<object>","<embed>","<iframe>", "<img>","<del>","<ins>","<wbr>","<br>","<span>","<bdo>","<bdi>","<rp>","<rt>", "<ruby>","<mark>","<u>","<b>","<i>","<sup>","<sub>","<kbd>","<samp>","<var>", "<code>","<time>","<data>","<abbr>","<dfn>","<q>","<cite>","<s>","<small>", "<strong>","<em>","<a>","<div>","<figcaption>","<figure>","<dd>","<dt>", "<dl>","<li>","<ul>","<ol>","<blockquote>","<pre>","<hr>","<p>","<address>", "<footer>","<header>","<hgroup>","<aside>","<article>","<nav>","<section>", "<body>","<noscript>","<script>","<style>","<meta>","<link>","<base>", "<title>","<head>","<html>" ); $list = trim(strtolower($blacklisted)); $list = preg_replace('/[^az ]/i', '', $list); $list = '<' . str_replace(' ', '> <', $list) . '>'; $list = array_map('trim', explode(' ', $list)); return array_diff($html5, $list); }

Then run it:

 $blacklisted = '<html> <bogus> <EM> em li ol'; $whitelist = blacklistElements($blacklisted); if (count($errors)) { echo "There were errors.\n"; print_r($errors); echo "\n"; } else { // Do strip_tags() ... }

http://codepad.org/LV8ckRjd

So, if you pass in what you do not want to allow, it will return you a list of HTML5 elements in the form of an array , after which you can pass it to strip_tags() after attaching it to a string:

 $stripped = strip_tags($html, implode('', $whitelist)));

Caveat emptor

Now I somehow cracked it together, and I know that there are some problems that I have not thought through yet. For example, from strip_tags() man page for the $allowable_tags argument:

Note:
This parameter must not contain spaces. strip_tags() sees the tag as a case-insensitive string between < and the first space or > . This means that strip_tags("<br/>", "<br>") returns an empty string.

Late and for some reason, I cannot understand what this means for this approach. So I have to think about it tomorrow. I also compiled a list of HTML elements in the $html5 function element on this MDN page. A keen reader may notice that all tags are in this form:

 <tagName>

I'm not sure how this will affect the result, whether it is necessary to take into account the variations in the use of shorttag <tagName/> and some of them, oh, more complex variations. And of course there are more tags out there .

So this is probably not ready for production. But you have an idea.

Strip_tags disable some tags

More articles: