Fuzzy string matching that might skip? for example me (.*)." has 0 distance to "I'm here."

I am writing a Python chatbot. No matter what the method is (Levenshtein, LCS, regular expression, etc.), I need a template like My name is [ A ]. smart enough to match strings like:

 My name is Tslmy. #Distance should = 0, and groupdict()['a'] outputs "Tslmy" My name is Tesla Tahomana. #Distance should = 0(!), and groupdict()['a'] outputs "Tesla Tahomana" my naem ist tslmy . #With a little typo, the distance = 5, and groupdict()['a'] outputs "tslmy " 

Let me use groupdict()['a'] to indicate what thing [ A ] captured (actually (?P<identifier>match) ).

  • In other words, I'm looking for "Levenshtein" with missing / missing / spaces / neglects, and picks what was missing .
  • In another way, I’m looking for a fuzzy (so-called) regular expression that might be less strict with the pattern, still provides the good old groupdict() , as well as the value “fuzzy” (or “editing distance” later needed to determine “ best matching pattern with string ").
    This is the preferred solution because it provides "sufficient" groupdict() if it is well managed.
    However, the TRE library and the REGEX library, which turned out to be the closest solution, do not seem to give a “fuzzy” value. If this can be solved, so much the better!

Is it possible? Thank you for your attention.

Update:

I decided to use a powerful regex module, but still could not get a "fuzzy value".

Since the question on this page is theoretically resolved, adding too much further will be dishonorable. Therefore, I put forward another question about this new problem and I hope that you can solve it!

+7
source share
2 answers

You can use RegEx for basic matching:

 r"My name is (\w+){1,2}." 

And then use the TRE library to allow options.

+1
source

DAT REGEX O_O

(me?) (?: my | | ut)) \ s + ((:( ?: my | ut?).?.? (?.? .. I | n..e | on ..) | ( ?.?.?.? .. I | n..e | pa ..)) \ s + (:( ?: there is | si) | (?: there is | si)) \ s + (\ w [\ w \ s]) \ s

Split it up:

  • (?i) : set the modifier i so that it is not case sensitive.
  • (?:(?:my|ym).?|.?(?:my|ym)) : this will match my, ym, My, Ym, may, amy etc...
  • \s+ : matches spaces one or more times
  • (?:.?(?:..am|n..e|na..)|(?:..am|n..e|na..).?) : match name, naao, tame, lame, n99e, names, Naats etc...
  • \s+ : matches spaces one or more times
  • (?:(?:is|si).?|.?(?:is|si)) : Match is, si, ist, sit, siR etc...
  • \s+ : matches spaces one or more times
  • (\w[\w\s]*) : match words and spaces one or more times and group them (it should start with the word \w )
  • \s* : match white spaces zero or more times

Online demo

0
source

All Articles