Identify differences between tree structures

This is a CS question, but interesting:

Let's say we have two tree structures with more or less the same node reorganizations. How would you find

  • any
  • in a sense minimal

sequence of operations

  • MOVE(A, B) - moves node A under node B (with all subtree)
  • INSERT(N, B) - inserts a new node N under node B
  • DELETE (A) - removes node A (with all subtree)

which converts one tree to another.

Probably, there may be cases when such a conversion is impossible, the root A with the descendant B with the root B with the child element A, etc. is trivial.) In such cases, the algorithm simply delivers the result "impossible."

An even more impressive version is a generalization for networks, i.e. when we assume that a node can occur several times in a tree (effectively having several "parents"), while loops are forbidden.

Disclaimer: This is not homework, it actually comes from a real business problem, and I was very interested to know if anyone can find out about the solution.

+56
comparison algorithm computer-science diff tree
May 05 '11 at 8:41 a.m.
source share
4 answers

There is not only a Wikipedia article on graph isomorphism (as Space_C0wb0y points out), but also a special article on the graph isomorphism problem. It has a section Solved special cases , for which polynomial time solutions are known. Trees are one of them, and it provides the following two links:

+15
May 05 '11 at 12:47 a.m.
source share

It was not clear to you whether you were comparing abstract syntax trees for source code, XML documents interpreted as trees, or some other type of tree.

There are a number of articles that discuss comparing syntax trees and calculating minimum distances in various ways. Ideas must be relevant.

Good article Change distillation , which attempts to compare the source code for two abstract syntax trees and report a minimal difference. The document talks about a specific method and also briefly mentions (and gives a link to) many of these methods.

Few of these algorithms are actually implemented in available tools for comparing source text. Our Smart Differencer is one of them.

+12
May 11 '11 at 5:14 a.m.
source share

Although this question is old, I will add a few more links and algorithms below:

In addition, GitHub (in javascript) has libraries and frameworks that distinguish between tree structures, for example, applications that work with JSON data or XML trees (for example, for client side MVC / MVVM):

+8
May 2, '15 at 6:50
source share

In case people find this question and need something that is implemented for Node.js or a browser, I provide an example of the link and code for the implementation that I wrote, which you can find on github here: ( https: // github.com/hoonto/jqgram.git ) based on existing PyGram Python code ( https://github.com/Sycondaman/PyGram ).

This is an algorithm for approximating distance to a tree, but it is much faster than finding the true editing distance. The approximation is performed in O (n log n) time and O (n) space, while the true editing distance is often O (n ^ 3) or O (n ^ 2) using well-known algorithms for the true editing distance. See the Academic article that follows the PQ-Gram algorithm: ( http://www.vldb2005.org/program/paper/wed/p301-augsten.pdf )

So using jqgram:

Example:

 var jq = require("jqgram").jqgram; var root1 = { "thelabel": "a", "thekids": [ { "thelabel": "b", "thekids": [ { "thelabel": "c" }, { "thelabel": "d" } ]}, { "thelabel": "e" }, { "thelabel": "f" } ] } var root2 = { "name": "a", "kiddos": [ { "name": "b", "kiddos": [ { "name": "c" }, { "name": "d" }, { "name": "y" } ]}, { "name": "e" }, { "name": "x" } ] } jq.distance({ root: root1, lfn: function(node){ return node.thelabel; }, cfn: function(node){ return node.thekids; } },{ root: root2, lfn: function(node){ return node.name; }, cfn: function(node){ return node.kiddos; } },{ p:2, q:3 }, function(result) { console.log(result.distance); }); 

And this gives you a number from 0 to 1. The closer to zero, the more closely the two trees are connected in jqgram. One approach might be to use jqgram to narrow on several closely related trees from many trees based on its speed, and then use the true editing distance on the few remaining trees that you need to study more carefully, and for this you can find a python implementation for a link or port algorithm of Zhang and Shashi, for example.

Please note that the lfn and cfn parameters determine how each tree should determine the label names of the node and the child array for each root of the tree independently, so that you can do funny things, for example, comparing an object with the browser DOM. All you have to do is provide these functions along with each root, and jqgram will do the rest by calling your provided lfn and cfn functions to create the trees. So in this sense, it (in my opinion, one way or another) is much easier to use than PyGram. Also its javascript, so use its client or server!

ALSO, to answer the question about loop detection, check the clone method inside jqgram, there is loop detection there, but the credit for this belongs to the node-clone author, from which this part has been slightly modified and included.

+6
Jun 15 '13 at 16:35
source share



All Articles