Is it possible to reject the editing distance between the regular expression and the string?

If yes, please explain.

Re: that distance - "The distance between two lines is defined as the minimum number of changes needed to convert one to the other."

For example, xyz for XYZ will have 3 changes, so the string xYZ is closer to XYZ and xyz.

If the pattern has the value [0-9] {3} or, for example, 123, then a23 will be closer to the pattern than ab3.

How can you find the shortest distance between a regular expression and an inappropriate string?

The above Damerau-Levenshtein distance algorithm.

+7
regex levenshtein distance distance
source share
2 answers

You can use Finite State Machines for efficient execution (i.e. linear in time). If you use a converter, you can even write the conversion specification quite compactly and do a lot more nuanced conversions than just insert or delete - see Wikipedia for the End State Transformer as a starting point, as well as software such as the FSA or FSA6 toolkit (which has a not completely stable web demo ). There are many libraries for FSA manipulation; I do not want to suggest that the previous two be yours or the best options, only two that I have heard about.

If, however, you only need an effective, approximate search, there is a less flexible, but already implemented option for you: TRE , which has an approximate matching function that returns the cost of the match - that is, the distance to the match from your point of view.

+7
source share

If you mean the line with the smallest levenshtein distance between the closest matching line and the pattern, then I am sure you can do this, but you will have to convert the Regex to DFA yourself, and then try to match and whenever something fails, Do not deterministically continue, as if it had passed, and track differences in numbers. you can use A * search or something similar for this, it would be quite inefficient though (O (2 ^ n) is the worst case)

+3
source share

All Articles