As Iraq pointed out, there are some options in the lines of Levenshtein, but you would look at the serialization of your object and compare it lexicographically, since, as Ira said, the JSON-specific language that you will not be taken into account (two trees can be identical to JSON, but very differ in Levenshtein distance). What you want is the tree editing distance.
So, to add some details around the tree editing distance, the well-known algorithms used in this space are typical, for example, Zhang and Shasha or Klein, and you can find the Python implementation of Zhang and Shashi. These algorithms will receive the minimum number of changes to convert one tree to another, thereby providing your diff. However, they are somewhat slow O (n ^ 2) in the absolute best possible way, so if you compare a large number of objects or JSON files, you will find yourself improving your golf game, making dishes, underwear, bathing your pets and other household assembly.
And in fact, the art that Iraq is talking about actually consists in the fact that these algorithms are complex and computationally expensive. So what you can do is be creative. One of the methods narrows the number of objects that need to be compared. For example, why calculate the editing distance between two JSON objects that are clearly more like intermediate ones than each other? Do not calculate the editing distance on objects that are identical using lexicographic comparison. If two objects are slightly or very different from each other, perhaps forget about the difference and just say that there should be a direct replacement.
To apply the "art" of tree editing distance, which saves you from unnecessary processor cycles, you need a way to provide a metric around what is meant by "somewhat similar" or "dramatically different."
To this end, I wrote an implementation of the PQ-Gram tree approach algorithm ( http://www.vldb2005.org/program/paper/wed/p301-augsten.pdf ), which you can find on github for use in Node.js or in a browser ( https://github.com/hoonto/jqgram.git ) based on existing PyGram Python code ( https://github.com/Sycondaman/PyGram ).
PQ-Gram is much, much faster than true editing distance algorithms working in O (n log n) and O (n), where n is the number of nodes.
Therefore, I recommend using jqgram very quickly to understand what you are looking at in terms of distance metrics for editing JSON objects. Determine which JSON objects should be compared, instead of just replacing, and then when you want the true distance to get diff, use Klein or Zhang and Shasha to get the actual diff.
Here the jqgram JSON tree object edits an example of approximation of distance taken directly from README to implement jqgram on github:
var jq = require("jqgram").jqgram; var root1 = { "thelabel": "a", "thekids": [ { "thelabel": "b", "thekids": [ { "thelabel": "c" }, { "thelabel": "d" } ]}, { "thelabel": "e" }, { "thelabel": "f" } ] } var root2 = { "name": "a", "kiddos": [ { "name": "b", "kiddos": [ { "name": "c" }, { "name": "d" }, { "name": "y" } ]}, { "name": "e" }, { "name": "x" } ] } jq.distance({ root: root1, lfn: function(node){ return node.thelabel; }, cfn: function(node){ return node.thekids; } },{ root: root2, lfn: function(node){ return node.name; }, cfn: function(node){ return node.kiddos; } },{ p:2, q:3, depth:10 }, function(result) { console.log(result.distance); });
The lfn and cfn parameters determine how each JSON tree should determine the label names of the node and the child array for each root of the tree, so that you can do funny things, for example, compare JSON objects from different sources. All you have to do is provide these functions along with each root, and jqgram will do the rest by calling your provided lfn and cfn functions to create the trees.