The correct way to cache only some class methods with joblib

I am writing a class in which there are some methods based on calculations, and some parameters that the user wants to iteratively adjust and are independent of the calculation.

Actual use is used for visualization, but here is an example of a cartoon:

class MyClass(object): def __init__(self, x, name, mem=None): self.x = x self.name = name if mem is not None: self.square = mem.cache(self.square) def square(self, x): """This is the 'computation heavy' method.""" return x ** 2 def report(self): """Use the results of the computation and a tweakable parameter.""" print "Here you go, %s" % self.name return self.square(self.x) 

The basic idea is that the user may want to create many instances of this class with the same x parameters, but with different name . I want to allow the user to provide a joblib.Memory object that will cache part of the calculations so that they can "communicate" to many different names without having to re-calculate the square of the array every time.

(This is a bit strange, I know. The reason the user needs a separate instance of the class for each name is because they will actually interact with an interface function that looks like this.

 def myfunc(x, name, mem=None): theclass = MyClass(x, name, mem) theclass.report() 

But let's not ignore it now).


Following joblib docs , I cache the square function using the line self.square = mem.cache(self.square) . The problem is that since self will be different for different instances, the array will be redistributed every time, even when the argument is the same.

I assume that the correct way to handle this is to change the line to

 self.square = mem.cache(self.square, ignore=["self"]) 

However, are there any drawbacks to this approach? Is there a better way to do caching?

+7
python memoization joblib
source share
1 answer

From docs ,

If you want to use the cache inside the class, the recommended template is to cache the pure function and use the cached function inside your class.

Since you want memory caching to be optional, I recommend something like this:

 def square_function(x): """This is the 'computation heavy' method.""" print ' square_function is executing, not using cached result.' return x ** 2 class MyClass(object): def __init__(self, x, name, mem=None): self.x = x self.name = name if mem is not None: self._square_function = mem.cache(square_function) else: self._square_function = square_function def square(self, x): return self._square_function(x) def report(self): print "Here you go, %s" % self.name return self.square(self.x) from tempfile import mkdtemp cachedir = mkdtemp() from joblib import Memory memory = Memory(cachedir=cachedir, verbose=0) objects = [ MyClass(42, 'Alice (cache)', memory), MyClass(42, 'Bob (cache)', memory), MyClass(42, 'Charlie (no cache)') ] for obj in objects: print obj.report() 

Execution gives:

 Here you go, Alice (cache) square_function is executing, not using cached result. 1764 Here you go, Bob (cache) 1764 Here you go, Charlie (no cache) square_function is executing, not using cached result. 1764 
0
source share

All Articles