Optimization methods used by std :: regex_constants :: optimize

I work with std::regex and read about the various constants defined in std::regex_constants , I came across std::optimize , reading about it, it looks like it is useful in my application (I only need one instance of the regex initialized at the beginning, but it is used several times during the boot process).

According to working paper n3126 (p. 1077), std::regex_constants::optimize :

Indicates that the regular expression engine should pay more attention to the speed at which regular expressions are matched, and less than the speed at which regular expression objects are built. Otherwise, it does not have a noticeable effect on the program output.

I was curious what type of optimization would be done, but it didn't seem to have much literature (really, it seems to be undefined), and one of the only things I found was at cppreference.com , which states that std::regex_constants::optimize :

Indicates a regex engine to make matching faster, and the potential cost of slowing down construction. For example, this could mean converting a non-deterministic FSA to a deterministic FSA.

However, I do not have a formal background in computer science, and although I know the basics of what the FSA is, I understand the main difference between a deterministic FSA (each state has only one possible next state) and a non-deterministic FSA (with several potential states); I do not understand how this improves the coordination time. Also, I would be interested to know if there are any other optimizations in various implementations of the C ++ standard library.

+7
source share
2 answers

There is some useful information on the topic of regex engines and performance compromises (much more than they can fit in response to stackoverflow) in Mastering Regular Expressions by Jeffrey Friedl.

It is worth noting that Boost.Regex, which is the source for N3126, documents optimize as "This is not currently valid for Boost.Regex."

PS

indeed, it is apparently determined by implementation

No, it is not specified. Implementation-specific tools determine that an implementation is required to determine the choice of behavior. Implementations are not required to document how their regular expression engines are implemented or what (if any) difference makes the optimize flag.

PS 2

in various STL implementations

std::regex not part of STL, the standard C ++ library is not the same as STL.

+4
source

See http://swtch.com/~rsc/regexp/regexp1.html for a nice explanation of how NFA-based regular expression implementations can avoid the exponential backtracking that occurs in DFA connectors under certain circumstances.

+1
source

All Articles