Python sanitization

I need to make a very fast n-dirty input disinfection, and I would like to basically convert everything <, > to &lt;, &gt; .

I would like to achieve the same results as '<script></script>'.replace('<', '&lt;').replace('>', '&gt;') without repeating the line several times. I know about maketrans in combination with str.translate (i.e. http://www.tutorialspoint.com/python/string_translate.htm ), but this only converts from 1 char to another char. In other words, you cannot do something like:

 inList = '<>' outList = ['&lt;', '&gt;'] transform = maketrans(inList, outList) 

Is there a builtin function that can perform this conversion in one iteration?

I would like to use the builtin features as opposed to external modules. I already know about Bleach .

+4
source share
2 answers

You can use cgi.escape()

 import cgi inlist = '<>' transform = cgi.escape(inlist) print transform 

Output:

 &lt;&gt; 

https://docs.python.org/2/library/cgi.html#cgi.escape

cgi.escape (s [, quote]) Convert the characters '&', '<' and '>' to string s for safe HTML sequences. Use this if you need to display text that may contain such characters in HTML. If the optional flag quote is true, the quotation mark (") character is also translated, this helps to include double quotes as in the HTML attribute value. Note that single quotation marks are never translated.

+9
source

You can define your own function, which cycles through the line once and replaces any characters you define.

 def sanitize(input_string): output_string = '' for i in input_string: if i == '>': outchar = '&gt;' elif i == '<': outchar = '&lt;' else: outchar = i output_string += outchar return output_string 

Then call

 sanitize('<3 because I am > all of you') 

gives

 '&lt;3 because I am &gt; all of you' 
+1
source

All Articles