Dropping a function into a different context in Python

Question

Dropping a function into a different context in Python

I wrote the Python interface for a process-oriented task distribution system that we develop / use internally in my workplace. While fairly experienced programmers, the primary people using this interface are research scientists, not software developers, so ease of use and maximum simplification of the interface are of utmost importance as much as possible.

My library expands the sequence of entries in the sequence of brine files on a shared file server, then creates jobs that load these inputs, perform calculations, sort the results, and exit; the client script then selects the backup and creates a generator that loads and yields the results (or repeats any exception that the calculation function executes.)

This is only useful because the computational function itself is one of the serialized inputs. cPickle is pretty pleased to sort function references, but requires the pickled function to be re-ported in the same context. This is problematic. I have already solved the problem of finding the reimport module, but the vast majority of the time is a top-level function that is pickled and therefore has no path to the module. The only strategy I discovered to be able to uncover such a function on compute nodes is a nauseating little approach to modeling the source environment in which the function was pickled before scattering it:

... # At this point, we've identified the source of the target function. # A string by its name lives in "modname". # In the real code, there is significant try/except work here. targetModule = __import__(modname) globalRef = globals() for thingie in dir(targetModule): if thingie not in globalRef: globalRef[thingie] = targetModule.__dict__[thingie] # sys.argv[2]: the path to the pickle file common to all jobs, which contains # any data in common to all invocations of the target function, then the # target function itself commonFile = open(sys.argv[2], "rb") commonUnpickle = cPickle.Unpickler(commonFile) commonData = commonUnpickle.load() # the actual function unpack I'm having trouble with: doIt = commonUnpickle.load()

The last line is the most important here - where my module picks up the function on which it should work. This code, as written, works as desired, but the direct manipulation of character tables like this is troubling.

How can I do this or something very similar to this that does not force research scientists to separate their computation scripts into the proper class structure (they use Python as the most excellent graphing calculator ever, and I would like to continue let them they will), how Pickle desperately wants it, without the unpleasant, unsafe and just scary __dict__ -and- globals() manipulations that I use above? I strongly believe that there should be a better way, but exec "from {0} import *".format("modname") did not do this, several attempts to inject the load of the brine into the targetModule link did not, but eval("commonUnpickle.load()", targetModule.__dict__, locals()) did not. All of them do not work with Unpickle AttributeError compared to the inability to find a function in <module> .

What's better?

+8

python serialization pickle

Adam norberg Aug 18 '11 at 19:25

source share

4 answers

Noctis skytower · Answer 1 · 2011-08-18T19:57:57+0000

Etching functions can be quite annoying if you try to move them to a different context. If the function does not reference anything to the module in which it is located, and the links (if any) that are guaranteed to be imported, you can check the code with the Rudical Database Engine found in the Python Cookbook.

To support representations, the academic module captures the code from the query called during etching. When it comes time to expand the view, a LambdaType instance is created with a code object and a reference to the namespace containing all the imported modules. The solution has limitations, but works well enough for exercise.

View example

 class _View: def __init__(self, database, query, *name_changes): "Initializes _View instance with details of saved query." self.__database = database self.__query = query self.__name_changes = name_changes def __getstate__(self): "Returns everything needed to pickle _View instance." return self.__database, self.__query.__code__, self.__name_changes def __setstate__(self, state): "Sets the state of the _View instance when unpickled." database, query, name_changes = state self.__database = database self.__query = types.LambdaType(query, sys.modules) self.__name_changes = name_changes

Sometimes it becomes necessary to make changes to registered modules available in the system. If, for example, you need to make a reference to the first module ( __main__ ), you may need to create a new module with an available namespace loaded in a new module object. The same recipe used the following technique.

Example for modules

 def test_northwind(): "Loads and runs some test on the sample Northwind database." import os, imp # Patch the module namespace to recognize this file. name = os.path.splitext(os.path.basename(sys.argv[0]))[0] module = imp.new_module(name) vars(module).update(globals()) sys.modules[name] = module

Jürgen strobel · Answer 2 · 2011-08-18T19:43:58+0000

In order for the module to be recognized as loaded, I think it should be in sys.modules, and not just its contents, imported into your global / local namespace. Try to do everything and then get the result from the artificial environment.

 env = {"fn": sys.argv[2]} code = """\ import %s # maybe more import cPickle commonFile = open(fn, "rb") commonUnpickle = cPickle.Unpickler(commonFile) commonData = commonUnpickle.load() doIt = commonUnpickle.load() """ exec code in env return env["doIt"]

wberry · Answer 3 · 2011-08-18T20:04:39+0000

While functions are advertised as first-class objects in Python, this is one case where you can see that they are indeed second-class objects. This is a reference to the called, not the object itself, which is pickled. (You cannot sort the lambda expression directly.)

There is an alternative use of __import__ , which you may prefer:

 def importer(modulename, symbols=None): u"importer('foo') returns module foo; importer('foo', ['bar']) returns {'bar': object}" if modulename in sys.modules: module = sys.modules[modulename] else: module = __import__(modulename, fromlist=['*']) if symbols == None: return module else: return dict(zip(symbols, map(partial(getattr, module), symbols)))

Thus, they will all be basically equivalent:

 from mymodule.mysubmodule import myfunction myfunction = importer('mymodule.mysubmodule').myfunction globals()['myfunction'] = importer('mymodule.mysubmodule', ['myfunction'])['myfunction']

Mike mckerns · Answer 4 · 2014-07-22T15:59:18+0000

Your question was long and I was too excited to do it on your very long question ... However, I think you want to do something that already has a pretty good existing solution. There is a fork of the parallel python library (i.e. pp ) that takes functions and objects and serializes them, sends them to different servers, and then unpacks and executes them. The plug is inside the pathos package, but you can download it yourself here:

http://danse.cacr.caltech.edu/packages/dev_danse_us

The “other context” in this case is another server ... and the objects are transferred by converting the objects to the source code and then back to the objects.

If you want to use etching, a lot of the way you do it is an extension to mpi4py , which serializes arguments and functions and returns pickled return values ... The package is called pyina and is usually used to send code and objects to cluster nodes in coordination with the scheduler clusters.

Both pathos and pyina provide map (and pipe ) abstractions and try to hide all the details of parallel computing behind abstractions, so scientists don’t need to learn anything but program ordinary serial python. They simply use one of the map or pipe functions and get parallel or distributed calculations.

Oh, I almost forgot. The dill serializer includes dump_session and load_session , which allow the user to easily serialize the entire interpreter session and send it to another computer (or simply save it for later use). This is very convenient for changing contexts, in another sense.

Get dill , pathos and pyina here: https://github.com/uqfoundation

Dropping a function into a different context in Python

More articles: