Regex matches any word - without non-greedy operators

I want to match something to a specific word (for example, a closing comment in C */ ), however due to performance reasons I don't want to use non-living operators.

For example, to match the comments C: /\*.*?\*/ too slow for my files. Is it possible to increase productivity?

+7
c # regex
source share
2 answers

Of course, use the loop unwrapping method:

 /\*[^*]*(?:\*(?!/)[^*]*)*\*/ 

Watch the regex demo

The deployment of the loop technique is based on the hypothesis that in most cases you play in alternating, this case being the most common and one that is exceptional. We name the first, ordinary, and second, special cases. The general syntax for deploying a loop technique can be written as follows:

normal* ( special normal* )*

Which may mean something similar, corresponds to the normal case if you find a special case corresponding to it, and not an ordinary case. You have noticed that part of this syntax could potentially lead to a superlinear match. To avoid endless coincidence, add the following rules:

  • the beginning of a special case, and the normal case should be mutually exclusive.
  • special must always match at least one character
  • the special expression must be atomic: be careful that ( special normal* )* can be reduced to (special)* , that if special is special* , it looks like (a*)* , which is an undefined expression.

C # template declaration (using string literal):

 var pattern = @"/\*[^*]*(?:\*(?!/)[^*]*)*\*/"; 

Regular Expression Distribution:

  • /\* - literally /*
  • [^*]* - 0 or more characters except *
  • (?:\*(?!/)[^*]*)* - 0 or more sequences ...
    • \*(?!/) - the letter * , which is not followed by /
    • [^*]* - 0 or more characters except *
  • \*/ - literal */

Here is a graph showing how effective are 3 potentially identical regular expressions (tested on regexhero.net *):

enter image description here

* Tested against /* Comment * Typical * Comment */

+7
source share

Try the following:

/\*(?:[^*]|\*(?!/))*\*/

I don't know if this is stribizhev's answer faster.

+1
source share

All Articles