String Algorithms

I have two lines (they will be descriptions in a simple database after all), let them say that they

  • String A: "Jimmy buffet apple orange coffee lime"
  • String B: "Car Bike Skateboard"

What I'm looking for is this. I want a function that will have the input "cocnut" and the output will be "String A"

We may have differences in capitalization, and spelling will not always be in place. The goal is a quick and dirty search if you do.

Is there any .net (or third-party) or recommend "similarity algorithms" for strings, so I can check that the input has a "pretty close fragment" and returns it? My database will contain 50 records, top parts.

+6
source share
1 answer

What you are looking for is called the editing distance between two lines. There are many implementations - heres one from the overflow stack itself .

Since you are only looking for part of the string, you want to get a locally optimal match, not the global match calculated by this method.

This is called the problem of local alignment and again it is easily solved by an almost identical algorithm - the only thing that changes is initialization (we do not punish everything that comes before the search string) and select the optimal value (we do not punish everything that comes after the search string).

+12
source