How to erase tags is safer than using the strip_tags function?

Question

How to erase tags is safer than using the strip_tags function?

I am having problems using the PHP strip_tags function when a string contains less and more characters. For example:

If I do this:

strip_tags("<span>some text <5ml and then >10ml some text </span>");

I will get:

 some text 10ml some text

But obviously I want to get:

 some text <5ml and then >10ml some text

Yes, I know I can use & lt; and & gt; but I don’t have the ability to convert these characters to HTML objects, since the data is already saved, as you can see in my example.

What I'm looking for is a smart way to parse HTML to get rid of real HTML tags only.

Since TinyMCE was used to generate this data, I know what actual html tags can be used anyway, so implementing strip_tags($string, $black_list) would be more useful than strip_tags($string, $allowable_tags) .

Anyone though?

+7

dom php html-parsing strip-tags

texai Feb 14 '11 at 18:40

source share

3 answers

If you want to have more than and less than signs, you need to avoid them:

> is>

< is <

See this: http://www.w3schools.com/html/html_entities.asp

+4

Piskvor Feb 14 '11 at 18:55

source share

Instead of strip_tags (), use htmlspecialchars () instead.

http://php.net/manual/en/function.htmlspecialchars.php

+2

dqhendricks Feb 14 '11 at 19:17

source share

mario · Accepted Answer · 2011-02-14T18:55:31+0000

As a wacky workaround, you can filter the brackets without html with:

 $html = preg_replace("# <(?![/az]) | (?<=\s)>(?![az]) #exi", "htmlentities('$0')", $html);

Apply strip_tags () afterwards. Note how this only works for your specific example and similar cases. This is a regular expression with some heuristic rather than artificial intelligence to distinguish html tags from irreversible angle brackets with a different value.

How to erase tags is safer than using the strip_tags function?

More articles: