When to use compiled Regex vs. interpreted?

After reading this article http://www.codinghorror.com/blog/archives/000228.html I understand the benefits of compiled regular expressions a little better, however, in what personal scenarios would you consider mandates for using compiled Reg Ex?

For example, I use a regular expression in a loop, and the regular expression string uses different variables at each iteration, so I won’t get any improvement by marking this regular expression as compiled right?


Hi, thanks for your answers, my actual code is not simple and compromised with a RE created on the fly, so I can't turn it on, so for all intensive purposes here is an example demonstrating my approach:
foreach (field field in fields.Where(x => x.condition)) MatchResults = Regex.Match(request.Message, field.RegularExpression); ... 
+6
regex
source share
4 answers

There are two ways to "compile" a regular expression in .NET. Regular expressions are always “compiled” before they can be used to find matches. When you instantiate the Regex class without the RegexOptions.Compiled flag, your regular expression is still converted to the internal data structure used by the Regex class. The actual matching process is done in this data structure, not in the string representing your regular expression. It persists as long as your Regex instance lives on.

Explicitly creating an instance of the Regex class is preferable to calling static Regex methods if you use the same regular expression more than once. The reason is that static methods create an instance of Regex anyway, and then drop it. They maintain the cache of newly compiled regular expressions, but the cache is quite small, and searching the cache is much more expensive than simply referencing a pointer to an existing instance of Regex.

The above form of compilation exists in every programming language or library that uses regular expressions, although not everyone offers control over it.

The .NET framework provides a second way to compile regular expressions by creating a Regex object and defining the RegexOptions.Compiled flag. The absence or presence of this flag does not indicate whether the regular expression is compiled. It indicates whether the regular expression is compiled as described above, or in full, as described below.

What RegexOptions.Compiled really does is create a new assembly with your regular expression compiled before MSIL. Then this assembly is loaded, compiled into machine code and becomes a permanent part of your application (during its launch). This process takes up a lot of processor ticks, and memory usage is constant.

You should use RegexOptions.Compiled only if you process so much data with it that the user really needs to wait for your regular expression. If you can’t measure the speed difference using a stopwatch, don’t worry about RegexOptions.Compiled.

+11
source share

I would compile RE when it needs to be used more than two to three times, and the compilation cost is more than compensated by the improvement in the execution time of the result.

I never compile one-time REs, and I always compile those that run more than five times (give or take a pair), but I never found the need for parameterized REs (this need may exist, never found it) so that it doesn't get caught into it.

EDIT: This article you are referencing claims that precompiling is an order of magnitude slower than interpretation (ten times), only saves 30%. And besides, interpreted REs are cached anyway. So I would say that he is definitely arguing about the random use of compilation.

30% savings means that restoring the initial compilation cost will require 100/3 (about 33) executions of the compiled RE. This is according to the MSDN doco on .NET. I always assumed that in my REs (Python / Perl / Java) this is not so bad, but I think I should check.

+2
source share

It seems to me that you are too specific in your expression. I would be interested to see a code example of what you are actually trying to parse, because my gut tells me that you are approaching, maybe not enough in common. If this is not the case, the set of expressions can also be precompiled, for example, each of them is compared during a cycle.

Change your question and add the code so that we can help you.

0
source share

Compiling a regular expression should only be done when the regular expression is complex enough. Simple regular expression expressions will run more efficiently without compilation, because the compilation time will be unnecessarily unnecessarily unnecessary. If a regex expression is very complex but only used once, you should evaluate whether it will be useful for compilation. You can measure this by installing a routine that multiplies two alternatives.

In almost all cases where the regex operator is used several times, it is worth compiling the regular expression outside the loop.

0
source share

All Articles