Python 'eval' security for list deserialization

Question

Python 'eval' security for list deserialization

Are there any security risks that may arise in this scenario:

eval(repr(unsanitized_user_input), {"__builtins__": None}, {"True":True, "False":False})

where unsanitized_user_input is the str object. The string is user generated and can be annoying. Assuming our web infrastructure didn't let us down, this is a true instance with honest inheritance from embedded Python.

If this is dangerous, can we do something for input to make it safe?

We definitely don't want to do anything contained in the string.

See also:

The wider context, which (I believe) is not important for the question, is that we have thousands of them:

 repr([unsanitized_user_input_1, unsanitized_user_input_2, unsanitized_user_input_3, unsanitized_user_input_4, ...])

in some cases enclosed:

 repr([[unsanitized_user_input_1, unsanitized_user_input_2], [unsanitized_user_input_3, unsanitized_user_input_4], ...])

which themselves are converted to strings with repr() , put into persistent storage and, ultimately, read back into memory using eval.

Eval deserialized strings from persistent storage much faster than pickle and simplejson. The interpreter is Python 2.5, so json and ast are not available. C modules are not allowed, and cPickle is not allowed.

+6

python eval

gravitation Jul 11 '09 at 1:25

source share

5 answers

If you have no doubt that unsanitized_user_input is an instance of str from Python's built-in modules without any changes, then this is always safe. In fact, it will be safe even without all the additional arguments, since eval(repr(astr)) = astr for all such string objects. You insert a string, you return a string. All you have done is escape and undo it.

All this makes me think that eval(repr(x)) not what you want - no code will ever be executed unless someone gives you an unsanitized_user_input object that looks like a string but is not, but it's different the question is if you are not trying to copy the string instance in the slowest way: D.

+8

hao Jul 11 '09 at 1:42

source share

With everything that you describe, it is technically safe to parse the rewritten lines, however I would avoid doing this anyway, as he asked for problems:

There might be some strange angular case where your assumption is that only reprintable lines are stored (for example, an error / other path to the repository that does not constitute an instant replacement of an exploit for code injection, where it could be otherwise unexploitable)
Even if everything is now in order, the assumptions may change at some point, and non-parsed data can be stored in this field by someone who does not know about this eval code.
Your code can be reused (or, even worse, copy + paste) into a situation that you have not considered.

As noted by Alex Martelli , in python2.6 and higher there is ast.literal_eval that will safely handle both strings and other simple data types such as tuples. This is probably the safest and most complete solution.

Another possibility is to use the string-escape codec. This is much faster than eval (about 10 times in time), available in earlier versions than literal_eval, and should do what you want:

 >>> s = 'he\nllo\' wo"rld\0\x03\r\n\tabc' >>> repr(s)[1:-1].decode('string-escape') == s True

([1: -1] is to remove the outer quotation marks.)

+4

Brian Jul 11 '09 at 10:57

source share

Generally, you should never allow anyone to publish code.

The so-called "paid professional programmers" have quite complicated code for writing in reality.

Accepting code from an anonymous audience - without using a formal QA - is the worst possible scenario.

Professional programmers — without a good, solid formal QA — will hash almost any website. Indeed, I am processing some incredibly bad code from paid professionals.

The idea of allowing unprofessional - without QA restrictions - sending code is really scary.

+3

S. Lott Jul 11 '09 at 1:38

source share

 repr([unsanitized_user_input_1, unsanitized_user_input_2, ... 
... unsanitized_user_input - str object

You do not need to serialize the rows to store them in the database.

If these are all rows, as you mentioned, why can't you just save the rows in db.StringListProperty ?

Nested elements can be a little more complicated, but why is this so? When you need to resort to eval to get data from a database, you probably have something wrong.

Can't save each unsanitized_user_input_x as its own db.StringProperty string and group it by reference field?

Any of these may not be applicable, because I do not know what you are trying to achieve, but I want to say that you cannot structure the data in such a way that you do not have to rely on eval (and also rely on the fact that this is not a security problem )?

+1

dbr Jul 11 '09 at 16:03

source share

Alex martelli · Accepted Answer · 2009-07-11T01:33:02+0000

This is really dangerous, and the safest alternative is ast.literal_eval (see the ast module in the standard library). Of course, you can create and modify ast to provide, for example, estimates of variables, etc., before you evaluate the resulting AST (when it comes to literals).

A possible exploit of eval starts with any object that it can fall into (let's say True here) and go through .__ class_ into its type object, etc. to object , then gets its subclasses ... basically it can get ANY type of object and destroy chaos. I can be more specific, but I would prefer not to do this in an open forum (the exploit is well known, but given how many people still ignore it by showing it with a wannabe script, kids can make things worse ... just avoid eval when entering it illegally user and live happily ever after! -).

Python 'eval' security for list deserialization

More articles: