How to calculate tree editing distance?

I need to calculate the editing distance between trees for my personal project. This document describes the algorithm, but I cannot make heads or tails out of it. Do you know of any resources that describe the applicable algorithm in a more accessible way? Pseudocode or code will also be useful.

+22
algorithm tree
Jun 30 '09 at 18:34
source share
9 answers

Here's the java source code (gzipped tarball below) for the Edit Edit tree algorithm, which may be useful to you.

The page contains links and several slides that go step by step according to the Zhang and Shasha algorithm and other useful links so you can speed up.

Edit:. Although this answer was accepted because it pointed to the Zhang-Shashi algorithm, the code in the link has errors. Steve Johnson and tim.tadh provided working Python code . See Steve Johnson's comment for more details.

+8
Jun 30 '09 at 19:20
source share

(This answer has been edited to reference the current version of the implementation given by tim.tadh)

This Python library correctly implements the Zhang-Shashi algorithm: https://github.com/timtadh/zhang-shasha

It started as a direct Java source port, indicated in the current accepted answer (the one that is linked to the link in tarball), but this implementation does not follow the rule and is almost impossible to run at all.

+24
Nov 24 '10 at 6:54
source share

I wrote an implementation ( https://github.com/hoonto/jqgram.git ) based on existing PyGram Python code ( https://github.com/Sycondaman/PyGram ) for those of you who want to use tree distance approximation, using the PQ-Gram algorithm in the browser and / or in Node.js.

The jqgram tree tree distance approximation module implements the PQ-Gram algorithm for applications on the server side and on the browser side; O (n log n) and O (n), where n is the number of nodes. See the Academic article from which the algorithm follows: ( http://www.vldb2005.org/program/paper/wed/p301-augsten.pdf )

The PQ-Gram approximation is much faster than getting the true editing distance via Zhang and Shasha, Klein or Gua et al., Which provide true editing distance algorithms that perform a minimum O time (n ^ 2) and therefore are often unsuitable.

Often in real applications, there is no need to know the true editing distance if you can get a relative approximation of several trees to a well-known standard. Javascript, in the browser and now on the server with the advent of Node.js, often with tree structures, and end-user productivity is usually important for the implementation and design of algorithms; this way jqgram.

Example:

var jq = require("jqgram").jqgram; var root1 = { "thelabel": "a", "thekids": [ { "thelabel": "b", "thekids": [ { "thelabel": "c" }, { "thelabel": "d" } ]}, { "thelabel": "e" }, { "thelabel": "f" } ] } var root2 = { "name": "a", "kiddos": [ { "name": "b", "kiddos": [ { "name": "c" }, { "name": "d" }, { "name": "y" } ]}, { "name": "e" }, { "name": "x" } ] } jq.distance({ root: root1, lfn: function(node){ return node.thelabel; }, cfn: function(node){ return node.thekids; } },{ root: root2, lfn: function(node){ return node.name; }, cfn: function(node){ return node.kiddos; } },{ p:2, q:3, depth:10 }, function(result) { console.log(result.distance); }); 

Please note that the lfn and cfn parameters determine how each tree should determine the label names of the node and the child array for each root of the tree independently, so that you can do funny things, for example, comparing an object with the browser DOM. All you have to do is provide these functions along with each root, and jqgram will do the rest by calling your provided lfn and cfn functions to create the trees. So in this sense, it (in my opinion, one way or another) is much easier to use than PyGram. Also its javascript, so use its client or server!

Now one of the approaches you can use is to use jqgram or PyGram to get several close trees, and then use the true editing distance algorithm from a smaller set of trees, why spend all the calculations on trees that you can easily identify very distant trees, or vice versa. This way you can use jqgram to narrow your choices.

Hope the code helps some people.

+5
Jun 15 '13 at 15:34
source share

Here you will find Java implementations of tree editing distance algorithms:

http://www.inf.unibz.it/dis/projects/tree-edit-distance

In addition to the 1989 Zhang & Shasha algorithm, there are also options for editing the editing tree of later algorithms, including Klein 1998, Demaine et al. 2009, and Robust Tree Edit Distance (RTED) algorithm from Pawlik & Augsten, 2011.

+2
Mar 12 '13 at 17:50
source share

I created a simple python shell (apted.py) for the APTED algorithm using jpype if anyone is interested:

 # To use, create a folder named lib next to apted.py, then put APTED.jar into it import os, os.path, jpype global distancePackage distancePackage = None global utilPackage utilPackage = None def StartJVM(): # from http://www.gossamer-threads.com/lists/python/python/379020 root = os.path.abspath(os.path.dirname(__file__)) jpype.startJVM(jpype.getDefaultJVMPath(), "-Djava.ext.dirs=%s%slib" % (root, os.sep)) global distancePackage distancePackage = jpype.JPackage("distance") global utilPackage utilPackage = jpype.JPackage("util") def StopJVM(): jpype.shutdownJVM() class APTED: def __init__(self, delCost, insCost, matchCost): global distancePackage if distancePackage is None: raise Exception("Need to call apted.StartJVM() first") self.myApted = distancePackage.APTED(float(delCost), float(insCost), float(matchCost)) def nonNormalizedTreeDist(self, lblTreeA, lblTreeB): return self.myApted.nonNormalizedTreeDist(lblTreeA.myLblTree, lblTreeB.myLblTree) class LblTree: def __init__(self, treeString): global utilPackage if utilPackage is None: raise Exception("Need to call apted.StartJVM() first") self.myLblTree = utilPackage.LblTree.fromString(treeString) ''' # Example usage: import apted apted.StartJVM() aptedDist = apted.APTED(delCost=1, insCost=1, matchCost=1) treeA = apted.LblTree('{a}') treeB = apted.LblTree('{b{c}}') dist = aptedDist.nonNormalizedTreeDist(treeA, treeB) print dist # When you are done using apted apted.StopJVM() # For some reason it doesn't usually let me start it again and crashes python upon exit when I do this so call only as needed ''' 
+1
Aug 28 '16 at 23:58
source share

There is a version of ICALP2007 magazine that you link to http://www.wisdom.weizmann.ac.il/~oweimann/Publications/TEDinTALG.pdf This version also has a pseudo-code.

0
Jul 02 '09 at 11:26
source share

There are many options for editing a tree. If you can go with a minimal tree editing distance down that limits inserts and removes leaves, I suggest trying the following article: http://portal.acm.org/citation.cfm?id=671669&dl=GUIDE&coll=GUIDE&CFID=68408627&CFTOKEN=54202965 . The implementation is a simple dynamic programming matrix with a cost of O (n2).

0
Dec 15 '09 at 7:29
source share

There is a good version of APTED in Python (distance to edit the tree of all paths), available now: https://pypi.org/project/apted/

Paper (2016): tree editing distance: reliable and memory efficient, Pawlik and Augsten

0
Apr 27 '19 at 10:17
source share

I think you're just looking for the shortest path between two peaks. If so, you can use the Dijkstra algorithm . (There is pseudo-code on the wikipedia page that you can link to.)

-6
Jun 30 '09 at 19:13
source share



All Articles