Comparing Custom XML Files

I saw that there are a lot of reports about XML comparisons, but none of the ones that I examined solved my problem.

We have text documents in XML format (product descriptions, headings and paragraphs) that are updated (for example, version), and I was instructed to make changes to the digests. That is, we want to take two sequential files and create a third; the heading structure (outline) should be retained, but only paragraphs with changes should be retained - additions as well as deletions should be noted.

So, I'm trying to find a way to walk with both DOM trees and detect additions and deletions, but I am having trouble finding them reliably. This is obvious because I have to do the diff, but I cannot use simple diff, because I want to make separate differences within each element, and because I cannot use the traditional diff result, but must have a fully formatted xml digest.

Any clues before I try to solve the problem "The longest common subsequence problem", which will be a huge task?

+3
source share
3 answers

It turns out my need had no solution at that time! Meanwhile, I developed my own xml-diff procedure, which is specific to my problem, so I ended up with a working solution.

Then, at the end of 2011, it was published: Slashdot: Researchers Empower Diff, Grep Unix

Dartmouth scientists have introduced options for grep and diff Unix command-line utilities that can handle more complex data types. Newer programs, called Context-Free Grep and Hierarchical Diff, provide the ability to parse data blocks rather than single lines. The study was partially funded by Google and the US Department of Energy.

0
source

I would suggest using XMLUnit as a mechanism for difference. It provides the ability to use a DifferenceListener , which is notified of every other node. In the handler, you can handle adding the appropriate DOM nodes to the target document.

+4
source

A professional solution to this problem - but it's not free - is a DeltaXML product. Buying it will probably be cheaper than creating your own.

+2
source

All Articles