Along the lines you were thinking about, std::vector<std::pair<boost::regex, int> > is likely to be most effective; You are trying to find a match.
The best solution is if you are ready to do the work, execute your own class of regular expressions without capturing (the operator (...) in most regular expressions). Without capturing, it is fairly easy to convert an expression to pure DFA, and you can either regular expressions, with each regular expression returning a different code to accept. (This is my own regex class that works. For most applications, it is not as flexible as it is Boost, because it does not support capture. But this is allowed by things like:
RegularExpression t1( expr1", 0 ); RegularExpression t2( expr2", 1 ); // ... RegularExpression t = t1 | t2 /* | t3 | t4 | ... */ ;
If it matches, it will return 0 if expression 1 matches, if expr2 matches, etc .; You can use the match identifier as an index in an int vector. (It returns -1 if there is no match.)
Thus, the search time is linear with respect to the input length. Regardless of the number of expressions you are trying to match. (My RegularExpression class was developed over 20 years ago to generate front-end compiler. About 15 years ago, I redid it to handle UTF-8 as input.)
For many years, the code was available on the Internet, but I did not get the web page at present, so if someone has not saved the old copy, it is not. I would be happy to send it to you, but warned that the library is not supported for a while, so it cannot be trivial to get it to compile the compiler. (It was originally written in standard C ++, and still contains a number of workarounds to get it compiled with things like Sun CC 4.x.)
source share