Are Python functions "compile" and "compiler.parse" safe (isolated)?

I plan to use these features in a web environment, so I am worried that these features may be used and used to execute malicious software on the server.

Edit: I am not fulfilling the result. I parse the AST tree and / or catch a SyntaxError.

This is the code in question:

try: #compile the code and check for syntax errors compile(code_string, filename, "exec") except SyntaxError, value: msg = value.args[0] (lineno, offset, text) = value.lineno, value.offset, value.text if text is None: return [{"line": 0, "offset": 0, "message": u"Problem decoding source"}] else: line = text.splitlines()[-1] if offset is not None: offset = offset - (len(text) - len(line)) else: offset = 0 return [{"line": lineno, "offset": offset, "message": msg}] else: #no syntax errors, check it with pyflakes tree = compiler.parse(code_string) w = checker.Checker(tree, filename) w.messages.sort(lambda a, b: cmp(a.lineno, b.lineno)) 

checker.Checker is a pyflakes class that parses the AST tree.

+4
source share
5 answers

compiler.parse and compile could be most accurately used to attack if an attacker could control his input and execute output. In most cases, you will either eval or exec their output so that they run, so they are still the usual suspects, and compile and compiler.parse (legacy BTWs) simply add another step between malicious input and execution.

EDIT: Just saw that you left a comment indicating that you are actually planning to use them in USER INPUT . Do not do this. Or at least do not actually execute the result. This is a huge security hole for those who finish running this code. And if no one is going to run it, why compile it? Since you clarified that you want to check the syntax, this should be fine. I would not save the conclusion, since there is no reason to make something easier for a potential attacker and the ability to get arbitrary code on your system is the first step.

If you need to save it, I probably would prefer a scheme similar to that which is usually used for images, where they are renamed in an unpredictable way with the added step of ensuring that it does not persist on the import path.

+2
source

I think the more interesting question is what do you do with compiled functions? Running them is definitely unsafe.

I tested several exploits, which I could consider as just a syntax check (I can not override classes / functions, etc.). I don't think that you still need to force python to execute arbitrary code at compile time

+4
source

If the resulting code or AST object is never evaluated, I think that you are only subjected to DDoS attacks.

If you evaluate the code entered by the user, this is the same as giving access to the shell as a user of the web server to each user.

+2
source

It is not, but it is not too difficult to find a subset of Python that can be isolated from sand. If you want to go down this road, you need to parse this subset of Python and intercept all calls, attribute lookups and everything else. You also do not want to give users access to any language construct, such as loop interruption, etc.

Still interesting? Go to jinja2.sandbox :)

+2
source

Yes, they can be maliciously used.

If you really want a secure sandbox, you can look at the features of the PyPy sandbox , but keep in mind that the sandbox is not easy, and there may be better ways to achieve what you are looking for.

Correction

Since you updated your question to clarify that you are only parsing an unreliable AST entry, there is no need for a sandbox: the sandbox is specifically designed to execute unreliable code (which, according to most people, asking about the sandbox).

Using compile / compiler only for parsing this path should be safe: there are no interceptions in the parsing of Python source code when executing the code. (Note that this does not necessarily apply to all languages: for example, Perl cannot be (fully) parsed without executing code.)

The only remaining risk is that someone can create some pathological Python source code that forces one of the parsers to use memory / processor from time to time, but attacks on resource exhaustion affect everything, so you just want to control it by as necessary. (For example, if your deployment is critical and cannot afford the denial of service by an attacker armed with pathological source code, you can perform parsing on the -limited subprocess resource).

+1
source

All Articles