I solve a problem when I have a text message matching thousands of regular expressions of a form
<some string> {0 or 300 chars} <some string> {0 or 300 chars}
eg.
"on"[ \t\r]*(.){0,300}"."[ \t\r]*(.){0,300}"from"
or a real example could be
"Dear"[ \t\r]*"Customer,"[ \t\r]*"Your"[ \t\r]*"package"[ \t\r]*(.){0,80}[ \t\r]*"is"[ \t\r]*"out"[ \t\r]*"for"[ \t\r]*"delivery"[ \t\r]*"via"(.){0,80}[ \t\r]*"Courier,"[ \t\r]*(.){0,80}[ \t\r]*"on"(.){0,80}"."[ \t\r]*"Delivery"[ \t\r]*"will"[ \t\r]*"be"[ \t\r]*"attempted"[ \t\r]*"in"[ \t\r]*"5"[ \t\r]*"wkg"[ \t\r]*"days."
First, I used the Java regex engine. I matched the input string with one regex at a time. This process was too slow. I found that the Java regex engine compiles regex in NFAs (non-deterministic state machines), which can slow down due to catastrophic backtracking. So I was thinking about converting regular expressions to DFAs (deterministic finite state machines) using flex-lexer to compile hundreds of regular expressions into a single DFA and thus I would get a match result in O (n), n is the length of the input string. But due to the fixed number of repetitions in the regex, flex takes forever compilation here .
, . ? , , - ( )
"on"[ \t\r]*(.)*"."[ \t\r]*(.)*"from"
. , , ("on", "." and "from") . , flex , , flex .
. ?