Libxml Cleaner adds unwanted <p> tag to HTML snippets
I am trying to misinform user input in order to prevent XSS injection using the libxml HTML cleaner. When I enter a line like this:
Normal text <b>Bold text</b>
I get this instead:
<p>Normal text <b>Bold text</b></p>
I want to get rid of the tag <p>that surrounds my entire input.
Here is the function that is currently doing the cleanup:
from lxml.html import clean
cleaner = clean.Cleaner(
scripts = True,
javascript = True,
allow_tags = None,
)
def sanitize_html(html):
return cleaner.clean_html(html)
In an unrelated note, the above code has one line: allow_tags = Nonewhere I am trying to remove all the HTML tags. Does libxml have a whitelist function where I only allow certain tags?