Perlre length limit

From man perlre :

The quantifier "*" is equivalent to "{0,}", the quantum "+" is "{1,}" and "?" quantifier to "{0,1}". n and m are limited by integral values ​​that are less than a given limit defined during the construction of perl. This is usually 32766 on the most common platforms. The actual limit can be seen in the error message generated by the code, for example:

  $_ **= $_ , / {$_} / for 2 .. 42; 

Ay, what is ugly - is there some kind of constant that I can get instead?

Edit: As Daxim (and perlretut ) pointed out, it may happen that 32767 is a magic hard-coded number. A bit of searching in Perl code goes a long way, but I'm not sure how to proceed to the next step, and actually figure out where the default reg_infty or REG_INFTY is actually set:

 ~/dev/perl-5.12.2 $ grep -ri 'reg_infty.*=' * regexec.c: if (max != REG_INFTY && ST.count == max) t/re/pat.t: $::reg_infty = $Config {reg_infty} // 32767; t/re/pat.t: $::reg_infty_m = $::reg_infty - 1; t/re/pat.t: $::reg_infty_p = $::reg_infty + 1; t/re/pat.t: $::reg_infty_m = $::reg_infty_m; # Surpress warning. 

Edit 2: DVK, of course, is right: it define d at compile time and probably can only be redefined with REG_INFTY .

+6
regex perl
source share
1 answer

Summary: There are three ways that I can find to find the limit: empirical, Perl benchmark matching, and theoretical.

  • Empirical:

     eval {$_ **= $_ , / {$_} / for 2 .. 129}; # To be truly portable, the above should ideally loop forever till $@ is true. $@ =~ /bigger than (-?\d+) /; print "LIMIT: $1\n"' 

    It seems obvious that it requires no explanation.

  • Perl Test Matches:

    Perl has a number of tests for regular expressions, some of which (in pat.t ) deal with testing this maximum value. So, you can come close to the fact that the maximum value calculated in these tests is "good enough" and follow the testing logic:

     use Config; $reg_infty = $Config {reg_infty} // 2 ** 15 - 1; # 32767 print "Test-based reg_infinity limit: $reg_infty\n"; 

    An explanation of where in the tests this is based is below.

  • Theoretical: This is an attempt to replicate the EXACT logic used by the C code to generate this value.

    This is more complicated than it sounds, because it affected 2 things: the Perl assembly configuration and the C #define group with branching logic. I was able to deeply understand this logic, but was stopped by two problems: #ifdefs refers to a bunch of tokens that are not really defined anywhere in the Perl code that I can find - and I don’t know how to find inside Perl that these define values ​​were, and the final default value (if I'm right, and those #ifdef always end by default) #define PERL_USHORT_MAX ((unsigned short)~(unsigned)0) (the actual limit is obtained by removing 1 bit, which leads to the number all elements - more details below).

    I'm also not sure how to access the number of bytes in short from Perl for which implementation was used to build the perl executable.

    So, even if the answer to both of these questions can be found (which I'm not sure), the resulting logic will certainly be ugly and more complex than the simple “empirical evaluation based on eval” proposed as the first option.

Below I will talk about where the various bits and pieces of logic associated with this limit live in Perl code, as well as my attempts to come up with a “theoretically correct” solution, corresponding to the C logic.


OK, here are some studies in part, you can finish it yourself, as I started or I will finish later:

  • From regcomp.c : vFAIL2("Quantifier in {,} bigger than %d", REG_INFTY - 1);

    So, the limit is obviously taken from REG_INFTY define. What is announced in:

  • rehcomp.h :

      /* XXX fix this description. Impose a limit of REG_INFTY on various pattern matching operations to limit stack growth and to avoid "infinite" recursions. */ /* The default size for REG_INFTY is I16_MAX, which is the same as SHORT_MAX (see perl.h). Unfortunately I16 isn't necessarily 16 bits (see handy.h). On the Cray C90, sizeof(short)==4 and hence I16_MAX is ((1<<31)-1), while on the Cray T90, sizeof(short)==8 and I16_MAX is ((1<<63)-1). To limit stack growth to reasonable sizes, supply a smaller default. --Andy Dougherty 11 June 1998 */ #if SHORTSIZE > 2 # ifndef REG_INFTY # define REG_INFTY ((1<<15)-1) # endif #endif #ifndef REG_INFTY # define REG_INFTY I16_MAX #endif 

    Please note that SHORTSIZE can be redefined through Config - I will leave information about this, but the logic will need to include $Config{shortsize} :)

  • From handy.h (at first glance this is not part of the Perl source, so it looks like an iffy step):

      #if defined(UINT8_MAX) && defined(INT16_MAX) && defined(INT32_MAX) #define I16_MAX INT16_MAX #else #define I16_MAX PERL_SHORT_MAX 
  • I could not find ANY place that defined INT16_MAX in general: (

    Someone help please !!!

  • PERL_SHORT_MAX is defined in perl.h:

      #ifdef SHORT_MAX # define PERL_SHORT_MAX ((short)SHORT_MAX) #else # ifdef MAXSHORT /* Often used in <values.h> */ # define PERL_SHORT_MAX ((short)MAXSHORT) # else # ifdef SHRT_MAX # define PERL_SHORT_MAX ((short)SHRT_MAX) # else # define PERL_SHORT_MAX ((short) (PERL_USHORT_MAX >> 1)) # endif # endif #endif 

    I have not yet been able to find the places where SHORT_MAX, MAXSHORT or SHRT_MAX were defined. Thus, the default value for ((short) (PERL_USHORT_MAX >> 1)) assumed to be :)

  • PERL_USHORT_MAX is defined very precisely in perl.h , and again I could not find a trace of the definition of USHORT_MAX / MAXUSHORT / USHRT_MAX .

    It looks like it defaults to setting: #define PERL_USHORT_MAX ((unsigned short)~(unsigned)0) . How to extract this value from Perl, I don’t have a hint - this is basically the number that you get with the bitwise negation of the short 0, so if unsigned short is 16 bytes, then PERL_USHORT_MAX will be 16, and PERL_SHORT_MAX will be 15, for example 2 ^ 15- 1, for example. 32767.

  • In addition, from t/re/pat.t (regex tests): $::reg_infty = $Config {reg_infty} // 32767; (to illustrate where non-standard, compiled by value is stored).

So, to get your constant, follow these steps:

 use Config; my $shortsize = $Config{shortsize} // 2; $c_reg_infty = (defined $Config {reg_infty}) ? $Config {reg_infty} : ($shortsize > 2) ? 2**16-1 : get_PERL_SHORT_MAX(); # Where get_PERL_SHORT_MAX() depends on logic for PERL_SHORT_MAX in perl.h # which I'm not sure how to extract into Perl with any precision # due to a bunch of never-seen "#define"s and unknown size of "short". # You can probably do fairly well by simply returning 2**8-1 if shortsize==1 # and 2^^16-1 otherwise. say "REAL reg_infinity based on C headers: $c_reg_infty"; 
+8
source share

All Articles