When regex engines use backtracking a lot, the matching process becomes very slow. Rollback happens very often when you allow different parts of your expression to match the particles of your input, especially when there is no match: the engine tries to separate the possibilities of separation between parts of your regular expression.
Consider this simple example from your regular expression: use [a-zA-Z0-9]+[_-]*[a-zA-Z0-9]* to match M45766235H. . Please note that there are two subexpressions that can cover all characters, starting from the second: the engine must consider all these features:
[a-zA-Z0-9]+ [a-zA-Z0-9]* ------------ ------------ M45766235H <nothing> M45766235 H M4576623 5H M457662 35H M45766 235H M4576 6235H M457 66235H M45 766235H M4 5766235H M 45766235H
Given that there is no coincidence, these are ten useless repetitions right there. But this is not the end! When there are other subexpressions in the mix that can cause ambiguous coverage, these ten possible matches will be checked for each of the possible matches in the string.
To say that the consequences of retreating quickly add up is an understatement: they multiply exponentially, eventually consuming a lot of your processor.
The moral of this story is to try to reduce the number of digressions. For example, your second expression
^[\\w]+[\\w-_\\.]*[\\w]+$
can be rewritten as follows:
^\\w[\\w-_\\.]*\\w$
The two expressions will match the same set of inputs, but if there is a match, the second expression will have a unique match, while the original expression will have approximately (N-2)^3 different ways to split the matched string among the three sub-expressions corresponding to the characters of words.
Here are some more related articles: Efficiency of greedy versus lazy quantifiers .
dasblinkenlight
source share