What is the best (word or symbol) algorithm based on differences?

So, I want to find the difference between the two lines based on each word (perhaps faster than for each character, although if each character is faster, then I would like to do it this way).

Here is an example of what I want to achieve: Source:

Hello there! 

Modified text:

 Helay scere? 

Diff:

 Hel[lo](ay) [th](sc)ere[!](?) 
  • parenthesized text is what was deleted, copied text is what was added.

There is a kind of super hacker way to do this using a command line tool such as opendiff , but this requires a newline between each character, since opendiff is line-based.

I use ruby ​​and did not find any tools for this ... but the language is not very important, as the algorithms can be easily ported.

thanks.

+8
string algorithm merge ruby diff
source share
5 answers

Here is the ruby ​​stone that distinguishes the lines: http://rubydoc.info/gems/diff-lcs/1.1.3/frames

Before hands, I just did (in irb)

 require 'rubygems' require 'diff/lcs' require 'diff/lcs/array' require 'diff/lcs/string' 

enter image description here

So, by writing logic for insertion, the built-in deleted and inserted markers become trivial due to this two-dimensional range of changes.

Although I'm not sure if this is the best way.

+2
source share

You can check this out: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem . It is not difficult to implement.

+2
source share

So you can reuse LCS (as indicated above) to find all the regular lines and remove them from both lines, replacing them with another line - let it just say "*". Then you iterate over both lines at the same time, and then combine the common and distinct images again.

Example

 A) Hello there! B) Helay scere? LCS detection gives us ["Hel"," ","ere"], and after replacement we have A) *lo*th*! B) *ay*sc*? Now you split on the delimiter ("*") giving you A) ["lo","th","!"] B) ["ay","sc","?"] 

And from here you just go to make a simple grid. Something key to note that there may be empty entries, for example, if you do this method on “Hell” and “Hel”, you will end up with

 Common LCS) ["Hel"] A) ["l"] B) [""] meaning your result will be Hel[l]() 

Hope this is acceptable.

+2
source share

Take a look at https://github.com/pvande/differ . This stone does what you are looking for.

+2
source share

The solution is to find the editing distance between the lines.

0
source share

All Articles