Saving a KDTree object in Python?

I am using the Scipy KDTree implementation to read a large 300 MB file. Now, is there a way I can just save the datastructure to disk and load it again or am I stuck with reading the starting points from the file and building the data structure every time I run my program? I create KDTree as follows:

def buildKDTree(self): self.kdpoints = numpy.fromfile("All", sep=' ') self.kdpoints.shape = self.kdpoints.size / self.NDIM, NDIM self.kdtree = KDTree(self.kdpoints, leafsize = self.kdpoints.shape[0]+1) print "Preparing KDTree... Ready!" 

Any suggestions please?

+8
python numpy scipy serialization pickle
source share
1 answer

KDtree uses nested classes to determine its node types (innernode, leafnode). Pickle only works on module level definitions, so a nested class disables it:

 import cPickle class Foo(object): class Bar(object): pass obj = Foo.Bar() print obj.__class__ cPickle.dumps(obj) <class '__main__.Bar'> cPickle.PicklingError: Can't pickle <class '__main__.Bar'>: attribute lookup __main__.Bar failed 

However, there is a (hacky) workaround by decapitating class definitions in scipy.spatial.kdtree in the module area so that the sorter can find them. If all of your code that reads and writes pickled KDtree objects installs these patches, this hack should work fine:

 import cPickle import numpy from scipy.spatial import kdtree # patch module-level attribute to enable pickle to work kdtree.node = kdtree.KDTree.node kdtree.leafnode = kdtree.KDTree.leafnode kdtree.innernode = kdtree.KDTree.innernode x, y = numpy.mgrid[0:5, 2:8] t1 = kdtree.KDTree(zip(x.ravel(), y.ravel())) r1 = t1.query([3.4, 4.1]) raw = cPickle.dumps(t1) # read in the pickled tree t2 = cPickle.loads(raw) r2 = t2.query([3.4, 4.1]) print t1.tree.__class__ print repr(raw)[:70] print t1.data[r1[1]], t2.data[r2[1]] 

Output:

 <class 'scipy.spatial.kdtree.innernode'> "ccopy_reg\n_reconstructor\np1\n(cscipy.spatial.kdtree\nKDTree\np2\nc_ [3 4] [3 4] 
+10
source share

All Articles