Strip_tags disable some tags

Based on the strip_tags document strip_tags second parameter accepts valid tags. However, in my case, I want to do the opposite. Suppose I accept tags that script_tags normally (by default) accept but only share the <script> . Any possible way to do this?

I do not mean someone to code it for me, but rather the way of possible ways to achieve this (if possible) is much appreciated.

+6
source share
5 answers

EDIT

To use the HTML purifier HTML.ForbiddenElements config directive, it looks like you would do something like:

 require_once '/path/to/HTMLPurifier.auto.php'; $config = HTMLPurifier_Config::createDefault(); $config->set('HTML.ForbiddenElements', array('script','style','applet')); $purifier = new HTMLPurifier($config); $clean_html = $purifier->purify($dirty_html); 

http://htmlpurifier.org/docs

HTML.ForbiddenElements should be set to array . I do not know what form the array members should take:

 array('script','style','applet') 

Or:

 array('<script>','<style>','<applet>') 

Or something else?

I think this is the first form, without delimiters; HTML.AllowedElements uses the configuration line form, which is somewhat common with TinyMCE valid elements syntax :

 tinyMCE.init({ ... valid_elements : "a[href|target=_blank],strong/b,div[align],br", ... }); 

So, I think this is just a term, and no attributes should be provided (since you are forbidding the element ... although there is HTML.ForbiddenAttributes , too). But this is an assumption.

I will add this note from the HTML.ForbiddenAttributes :

Warning: This directive complements %HTML.ForbiddenElements , so read this directive to discuss why you should think twice before using this directive.

Blacklisting is simply not as β€œreliable” as whitelisting, but you may have your own reasons. Just beware and be careful.

Without testing, I'm not sure what to tell you. I will continue to search for an answer, but I will most likely go to bed first. It is too late. :)


Although I think you really should use HTML Purifier and use its HTML.ForbiddenElements , I think that a reasonable alternative, if you really want to use strip_tags() , is to get a whitelist from the blacklist. In other words, delete what you do not want, and then use what is left.

For instance:

 function blacklistElements($blacklisted = '', &$errors = array()) { if ((string)$blacklisted == '') { $errors[] = 'Empty string.'; return array(); } $html5 = array( "<menu>","<command>","<summary>","<details>","<meter>","<progress>", "<output>","<keygen>","<textarea>","<option>","<optgroup>","<datalist>", "<select>","<button>","<input>","<label>","<legend>","<fieldset>","<form>", "<th>","<td>","<tr>","<tfoot>","<thead>","<tbody>","<col>","<colgroup>", "<caption>","<table>","<math>","<svg>","<area>","<map>","<canvas>","<track>", "<source>","<audio>","<video>","<param>","<object>","<embed>","<iframe>", "<img>","<del>","<ins>","<wbr>","<br>","<span>","<bdo>","<bdi>","<rp>","<rt>", "<ruby>","<mark>","<u>","<b>","<i>","<sup>","<sub>","<kbd>","<samp>","<var>", "<code>","<time>","<data>","<abbr>","<dfn>","<q>","<cite>","<s>","<small>", "<strong>","<em>","<a>","<div>","<figcaption>","<figure>","<dd>","<dt>", "<dl>","<li>","<ul>","<ol>","<blockquote>","<pre>","<hr>","<p>","<address>", "<footer>","<header>","<hgroup>","<aside>","<article>","<nav>","<section>", "<body>","<noscript>","<script>","<style>","<meta>","<link>","<base>", "<title>","<head>","<html>" ); $list = trim(strtolower($blacklisted)); $list = preg_replace('/[^az ]/i', '', $list); $list = '<' . str_replace(' ', '> <', $list) . '>'; $list = array_map('trim', explode(' ', $list)); return array_diff($html5, $list); } 

Then run it:

 $blacklisted = '<html> <bogus> <EM> em li ol'; $whitelist = blacklistElements($blacklisted); if (count($errors)) { echo "There were errors.\n"; print_r($errors); echo "\n"; } else { // Do strip_tags() ... } 

http://codepad.org/LV8ckRjd

So, if you pass in what you do not want to allow, it will return you a list of HTML5 elements in the form of an array , after which you can pass it to strip_tags() after attaching it to a string:

 $stripped = strip_tags($html, implode('', $whitelist))); 

Caveat emptor

Now I somehow cracked it together, and I know that there are some problems that I have not thought through yet. For example, from strip_tags() man page for the $allowable_tags argument:

Note:

This parameter must not contain spaces. strip_tags() sees the tag as a case-insensitive string between < and the first space or > . This means that strip_tags("<br/>", "<br>") returns an empty string.

Late and for some reason, I cannot understand what this means for this approach. So I have to think about it tomorrow. I also compiled a list of HTML elements in the $html5 function element on this MDN page. A keen reader may notice that all tags are in this form:

 <tagName> 

I'm not sure how this will affect the result, whether it is necessary to take into account the variations in the use of shorttag <tagName/> and some of them, oh, more complex variations. And of course there are more tags out there .

So this is probably not ready for production. But you have an idea.

+5
source

First, see what others have said on this topic:

Strip <script> tags and everything between PHP?

and

remove script tag from HTML content

It seems you have 2 options, one of them is a Regex solution, and both of the links above give them. The second way is to use an HTML cleaner .

If you drop the script tag for some other reason than sanitizing custom content, Regex might be a good solution. However, as everyone warned, it is recommended that you use an HTML cleaner if you sanitize the input.

+2
source

PHP solution (5 or more):

If you want to remove the <script> tags (or any others) and also want to remove the contents inside the tags , you should use:

OPTION 1 (simplest):

 preg_replace('#<script(.*?)>(.*?)</script>#is', '', $text); 

OPTION 2 (more universal):

 <?php $html = "<p>Your HTML code</p><script>With malicious code</script>" $dom = new DOMDocument(); $dom->loadHTML($html); $script = $dom->getElementsByTagName('script'); $remove = []; foreach($script as $item) { $item->parentNode->removeChild($item); } $html = $dom->saveHTML(); 

Then $html will be:

 "<p>Your HTML code</p>" 
+1
source

This is what I use to cut out the list of forbidden tags, you can do either delete tags that wrap content, or tags, including content, plus trim the remaining empty space.

 $description = trim(preg_replace([ # Strip tags around content '/\<(.*)doctype(.*)\>/i', '/\<(.*)html(.*)\>/i', '/\<(.*)head(.*)\>/i', '/\<(.*)body(.*)\>/i', # Strip tags and content inside '/\<(.*)script(.*)\>(.*)<\/script>/i', ], '', $description)); 

Input Example:

 $description = '<html> <head> </head> <body> <p>This distinctive Mini Chopper with Desire styling has a powerful wattage and high capacity which makes it a very versatile kitchen accessory. It also comes equipped with a durable glass bowl and lid for easy storage.</p> <script type="application/javascript">alert('Hello world');</script> </body> </html>'; 

Output result:

 <p>This distinctive Mini Chopper with Desire styling has a powerful wattage and high capacity which makes it a very versatile kitchen accessory. It also comes equipped with a durable glass bowl and lid for easy storage.</p> 
0
source

I am using the following:

 function strip_tags_with_forbidden_tags($input, $forbidden_tags) { foreach (explode(',', $forbidden_tags) as $tag) { $tag = preg_replace(array('/^</', '/>$/'), array('', ''), $tag); $input = preg_replace(sprintf('/<%s[^>]*>([^<]+)<\/%s>/', $tag, $tag), '$1', $input); } return $input; } 

Then you can do:

 echo strip_tags_with_forbidden_tags('<cancel>abc</cancel>xpto<p>def></p><g>xyz</g><t>xpto</t>', 'cancel,g'); 

Output: 'abcxpto<p>def></p>xyz<t>xpto</t>'

 echo strip_tags_with_forbidden_tags('<cancel>abc</cancel> xpto <p>def></p> <g>xyz</g> <t>xpto</t>', 'cancel,g'); 

Outputs: 'abc xpto <p>def></p> xyz <t>xpto</t>'

0
source

Source: https://habr.com/ru/post/925082/


All Articles