One optimization is to extract common prefixes. Change occurrences such as
(This is some text|This is some other text)
to
This is some (text|other text)
It must also be done at every level. Change occurrences such as
ABCD|ADCB|BACD|BADC|BCAD|BCDA|BDAC|BDCA|CABD
to
A(BCD|DCB)|B(A(CD|DC)|C(AD|DA)|D(AC|CA))|CABD
This optimization is such that the Regex engine should not test the same characters multiple times.
This can be achieved by sorting phases and viewing sequential elements. Be careful not to break the metacharacters. You do not want to break the middle .* Or \. .
Another way would be to use a Trie structure to search for prefixes. It is more stable, but a bit more complicated.
Markus jarderot
source share