How does this regex work?

Question

How does this regex work?

/^1?$|^(11+?)\1+$/ checks if the number (its value in unary) is prime or not.

Using this, perl -l -e '(1 x $_) !~ /^1?$|^(11+?)\1+$/ && print while ++$_;' returns a list of primes.

I don't have enough experience with Perl, but I understand that the regex will be true for a number that is not prime. So, if we print all the numbers that do not express true with this expression, we have a list of primes. That trying to execute perl request.

About the regex part,

^1?$ Part is designed to count 1 as not just

^(11+?)\1+$ is for matching non-prime numbers starting at 4.

What I do not understand is what ? in regular expression is generally necessary. For me, /^1$|^(11+)\1+$/ should be just fine and really

perl -l -e '(1 x $_) !~ /^1$|^(11+)\1+$/ && print while ++$_;' gives me the same set of primes.

Is there a flaw in my understanding of regex? Why is it needed ? ?

Does not match ? zero or one occurrence of the expression preceding it?

+15

regex perl primes

Lazer Jul 25 '10 at 15:32

source share

2 answers

First one ? will make "" (empty string, unary zero) will not be a prime number. Zero is defined as not simple.

The second is different; he stops the regex from greedy matching. This should significantly improve the performance of the match, since the first part of this section ( (11+) ) will not consume almost the entire line before returning. If you omit the question mark, you effectively check if the odd n is divisible by n-1 , and therefore one down; if you turn it on, you test divisibility into the first two and so on. Obviously, numbers are usually divided into smaller factors more often, so your match will be faster.

+6

Borealid Jul 25 '10 at 15:34

source share

cjm · Accepted Answer · 2010-07-25 15:51

First one ? designed to match an empty string (i.e. 0) as complicated. If you don't care if the regular expression matches 0, then this is optional.

Second one ? Designed for efficiency only. + usually greedy, which means that it matches the number of characters available, and then returns if the rest of the regular expression doesn't match. +? makes it inanimate, so it matches only 1 character, and then tries to match more if the rest of the regular expression doesn't match. (See the Quantifiers perlre section for more on greedy and unwanted matches.)

In this particular regular expression (11+?) Means it checks for divisibility by 2 ( '11' ), then 3 ( '111' ), then 4, etc. If you used (11+) , he checked divisibility by N (the number itself), then N-1, then N-2, etc. Since the divisor should be no more than N / 2, without ? he will spend time testing a lot of "potential" dividers, possibly working. This will still match odd numbers, only slower. (In addition, $1 will be the largest divisor, not the smallest.)

How does this regex work?

More articles: