Search for shared blocks

I have two files (f1 and f2) containing some text (or binary data).
How to quickly find common blocks?

eg.
f1: ABC DEF
f2: XXABC XEF

output:

common blocks:
length 4: "ABC" at f1 @ 0 and f2 @ 2 length 2: "EF" at f1 @ 5 and f2 @ 8

+3

Burkhard Sep 22 '08 at 20:16

3 answers

Wikipedia has pseudo-code for finding the longest common substring between two data sequences. In your case, you simply retrieve the entire common substring from the table, which is not a prefix of other common substrings (i.e., Maximum common substrings).

+1

Torsten marek Sep 22 '08 at 20:25

: http://sourceforge.net/projects/duplo/

+2

torial 22 . '08 20:19

The open source PMD project has a cut and paste detection module, which is listed on this page: http://pmd.sourceforge.net/integrations.html .

+1

David medinets Sep 23 '08 at 12:29

All Articles