Your mandatory restrictions may be damaged even if you remove all modules and all functions. Code can access files if it can use attributes of an arbitrary simple object, for example. zero number.
(0).__class__.__base__.__subclasses__()[40]('/etc/pas'+'swd')
Index 40 is individual and very typical of Python 2.7, but the index of the <type 'file'> subclass can be easily found:
[x for x in (1).__class__.__base__.__subclasses__()if'fi'+'le'in'%s'%x][0]( '/etc/pas'+'swd')
Any combination of whitelist and blacklist is unsafe and / or too restrictive. pypy sandbox is reliable without compromise:
... This subprocess can run arbitrary untrusted Python code, but all its I / O is serialized in the stdin / stdout pipe instead of being directly executed. The external process reads the pipe and decides whether the commands are allowed or not (sandbox) or even reinterprets them differently ...
In addition, a seccomp- based solution can be quite safe. ( blog )
I want to be sure that in the future the function will generate the same thing as today.
It is easy to write a function that has hard reproducible results , and it cannot be easily prevented:
class A(object): "This can be any very simple class" def __init__(self, x): self.x = x def __repr__(self): return repr(self.x) def strange_function():
... even if you delete everything that depends on time, a random number generator, an order based on a hash function, etc., it is also easy to write a function that sometimes exceeds the available memory or timeout limit and sometimes gives a result.
EDIT:
Roman, you recently wrote that you are sure that you can trust the user. Then there is a realistic solution. He should check the input and output of the function, write it to a file and check it on the virtual machine running the remote IPython notebook (a wonderful short instructional video, support for remote computing from the box, restarting the backend service via the web document menu from the browser in one second without data loss (input / output) in the laptop (html-document), because it is created dynamically step by step by our activity launching javascript, which causes the remote backend).
You do not need to be interested in internal calls, only global inputs and outputs, until you find the difference. The virtual machine should be able to independently verify the results and reproduce them. Configure the firewall that the machine accepts from you, but cannot initiate an outgoing connection. Configure the file system so that data cannot be saved by the current user, and therefore it is not available, except for software components. Disable database services. Check the input / output of the results in random order or start the two IPython laptop services on different ports and select a random backend for each command line in the laptop or restart the server process often before that is important. If you find a difference, debug your code and fix it.
You can automate it without a βlaptopβ, finally, only with remote IPython computers when you do not need interactivity.