I wrote an implementation ( https://github.com/hoonto/jqgram.git ) based on existing PyGram Python code ( https://github.com/Sycondaman/PyGram ) for those of you who want to use tree distance approximation, using the PQ-Gram algorithm in the browser and / or in Node.js.
The jqgram tree tree distance approximation module implements the PQ-Gram algorithm for applications on the server side and on the browser side; O (n log n) and O (n), where n is the number of nodes. See the Academic article from which the algorithm follows: ( http://www.vldb2005.org/program/paper/wed/p301-augsten.pdf )
The PQ-Gram approximation is much faster than getting the true editing distance via Zhang and Shasha, Klein or Gua et al., Which provide true editing distance algorithms that perform a minimum O time (n ^ 2) and therefore are often unsuitable.
Often in real applications, there is no need to know the true editing distance if you can get a relative approximation of several trees to a well-known standard. Javascript, in the browser and now on the server with the advent of Node.js, often with tree structures, and end-user productivity is usually important for the implementation and design of algorithms; this way jqgram.
Example:
var jq = require("jqgram").jqgram; var root1 = { "thelabel": "a", "thekids": [ { "thelabel": "b", "thekids": [ { "thelabel": "c" }, { "thelabel": "d" } ]}, { "thelabel": "e" }, { "thelabel": "f" } ] } var root2 = { "name": "a", "kiddos": [ { "name": "b", "kiddos": [ { "name": "c" }, { "name": "d" }, { "name": "y" } ]}, { "name": "e" }, { "name": "x" } ] } jq.distance({ root: root1, lfn: function(node){ return node.thelabel; }, cfn: function(node){ return node.thekids; } },{ root: root2, lfn: function(node){ return node.name; }, cfn: function(node){ return node.kiddos; } },{ p:2, q:3, depth:10 }, function(result) { console.log(result.distance); });
Please note that the lfn and cfn parameters determine how each tree should determine the label names of the node and the child array for each root of the tree independently, so that you can do funny things, for example, comparing an object with the browser DOM. All you have to do is provide these functions along with each root, and jqgram will do the rest by calling your provided lfn and cfn functions to create the trees. So in this sense, it (in my opinion, one way or another) is much easier to use than PyGram. Also its javascript, so use its client or server!
Now one of the approaches you can use is to use jqgram or PyGram to get several close trees, and then use the true editing distance algorithm from a smaller set of trees, why spend all the calculations on trees that you can easily identify very distant trees, or vice versa. This way you can use jqgram to narrow your choices.
Hope the code helps some people.