Search for partial substrings within a string

I have two lines that need to be compared in similarity. The algorithm must be designed to find the maximum similarity. In this case, the order matters, but intermediate (or missing) characters do not. Editing distance cannot be used in this case for various reasons.

The situation is basically this:

string 1: ABCDEFG string 2: AFENBCDGRDLFG 

the resulting algorithm will find the substrings A , BCD , FG

I currently have a recursive solution, but since this has to be done on huge amounts of data, any improvements would be greatly appreciated

+4
source share
1 answer

Looking at your only example, it looks like you want to find the longest common subsequence . Take a look at LCS

Is it just me, or is it NP-hard? - David Titarenko (from comment)

If you want an LCS of arbitrary number of rows to be NP-hard. But the number of input lines is constant (as in this case 2) this can be done in polynomial time.

+5
source

All Articles