How can a regular expression interpolation be deferred to a point of use?

For example, suppose I have a set of variables and an array of regular expressions that interpolate these variables:

my ($var1, $var2, $var3); my @search_regexes=( qr/foo $var1/, qr/foo bar $var2/, qr/foo bar baz $var3/, ); 

The above code will give us warnings that $var1 , $var2 and $var3 not defined at the regular expression compilation point for regular expressions in $search_regexes . However, I want to defer the interpolation variable in these regular expressions until they are used (or later (re) compiled after the variables have values):

 # Later on we assign a value to $var1 and search for the first regex in $_ ... $var1='Hello'; if (/$search_regexes[0]/) { # Do something ... } 

How can I redo the construct in the source code sample to allow this?

As a bonus, I would like to compile each regular expression after the value is assigned to the corresponding variable (s) appearing in that regular expression in the same way that the qr// operator does now (but too soon). If you can show how to continue expanding the solution to do this, I would really appreciate it.

Update:

I settled on a variant of the Hunter approach because, using it, I do not accept the impact of performance and minimal changes in my existing code. Other answers also helped me a bit to find alternative solutions to this problem and their performance implications when you need to pick up a lot of lines. My code now resembles the following:

 my ($var1, $var2, $var3); my @search_regexes=( sub {qr/foo $var1/}, sub {qr/foo bar $var2/}, sub {qr/foo bar baz $var3/}, ); ... ($var1,$var2,$var3)=qw(Hello there Mr); my $search_regex=$search_regexes[$based_on_something]->(); while (<>) { if (/$search_regex/) { # Do something ... # and sometimes change $search_regex to be another from the array } } 

This gives me what I was looking for with minimal changes in my code (i.e. just adding subsets to the array up) and lack of performance when using regular expressions.

+8
regex perl
source share
5 answers

I think that if you wrap each regular expression in an anonymous sub, you can do this delay:

 my ($var1, $var2, $var3); my @search_regexes=( sub { return qr/foo $var1/ }, sub { return qr/foo bar $var2/ }, sub { return qr/foo bar baz $var3/ }, ); 

Then, when you are going to evaluate them, you simply call an anonymous sub:

 ($var1, $var2, $var3) = qw(thunk this code); if( $_ =~ $search_regexes[0]->() ) { # Do something } 

I know that in Scheme this is called thunking. I'm not sure if it has a name in Perl. You can do something similar in Ruby with Proc Objects

+8
source share

A better solution would be to delay compiling the regular expression until these variables are defined. But first a dubious decision: regular expressions may include the code: qr/foo (??{ $var1 })/ . The block is executed during the match, and the result of the block is then used as a template.

How to delay compilation?

  • Just specifying them when assigning variables. These are fewer problems, as you might think, since any program can be expressed without (overriding) variables. Follow the rule that any announcement should also be a destination (and vice versa), and that should work. It:

     my $var1; my $re = qr/$var1/; $var1 = ...; $bar =~ $re; 

    becomes:

     my $var1 = ...; $re = qr/$var1/; $bar =~ $re; 
  • If this is not possible, we can use the closure that we evaluate before matching:

     my $var1; my $deferred_re = sub { qr/$var1/ }; $var1 = ...; $bar =~ $deferred_re->(); 

    Of course, this should recompile the regex on every call.

  • We can extend the previous idea by caching a regular expression:

     package DeferredRegexp; use overload 'qr' => sub { my ($self) = @_; return $self->[0] //= $self->[1]->(); }; sub new { my ($class, $callback) = @_; return bless [undef, $callback] => $class; } 

    Then:

     my $var1; my $deferred_re = DeferredRegexp->new(sub{ qr/$var1/ }); $var1 = ...; $bar =~ $deferred_re; 
+11
source share

(??{ }) does exactly what you ask for.

 our $var1; my $re = qr/foo (??{ $var1 )/; ... local $var1 = ...; /$re/ 

But it is very inconvenient. The source string is the so-called pattern. Numerous template systems are available to make this cleaner.

 my $pat_template = 'foo [% var1 %]'; ... Template->new->process($pat_template, { var1 => ... }, \my $pat); /$pat/ 

If the template does not need to be stored in a file, you can use the substring builder.

 my $re_gen = sub { my ($var1) = @_; qr/foo $var1/ }; ... my $re = $re_gen->(...); /$re/ 

Note. Inside (??{ }) you may run into the problem of using lexical variables declared outside. This is why I used the package variable in the first snippet.

+3
source share

Amon's answer is the most complete. However, the question is, why do you want to precompile your regular expressions if you are not 100% sure what they should be?

Like any compilation, everything must be resolved at compile time. You can, as amon showed you, specify your regular expression with variables, but this will recompile your regular expression when you call it again.

I suspect that you are not worried about compile time as a simple reuse. If you use these regular expressions over and over, is it not better to have only one place where they are supported?

Well, that sounds like a subroutine:

 sub test_regex { my $test_val = shift; my $regex_val = shift; my $regex_num = shift; if ( not defined $regex_num # Need both parameters die qq(Invalid call to subroutine test_regex); } if ( $regex_num == 0 ) { return $test_val =~ /foo $regex_val/; } elsif ( $regex_num == 1 ) { return $test_val =~ /foo bar $regex_val/; } elsif ( $regex_num == 2 ) { return $test_val =~ /foo bar bas $regex_val/; } else { die qq(Invalid value for regular expression value); } } 

Now you can call test_regex as follows:

 if ( test_regex ( $_, $var1, 1 ) ) { say "This is a regular expression match!"; } else { say "No it didn't match"; } 

You have one point at which you must support regular expressions (in your subroutine), but you still have the opportunity to name them again and again. Note that I need to pass three parameters: what I'm testing (it may be $_ , but maybe not), the value of $var1 and the number of the subroutine.

I could use global values ​​in my routine, but this is usually a bad idea:

 sub test_regex { my regex_num = shift; # Only thing I need. I'm assuming `$_` and `$var1` are global if ( not defined $regex_num # Need both parameters die qq(Invalid call to subroutine test_regex); } if ( $regex_num == 0 ) { return /foo $val1/; } elsif ( $regex_num == 1 ) { return /foo bar $val1/; } elsif ( $regex_num == 2 ) { return $test_val =~ /foo bar bas $val1/; } else { die qq(Invalid value for regular expression value); } } 

Then the call will be:

 $val1 = 'fubar' if ( test_regex( 1 ) ) { .... } 

This is more like what you had, but it is not a good idea.

+2
source share

If these are just character strings, try wrapping them in single quotes:

 my ($var1, $var2, $var3); my @search_regexes=( 'foo $var1', 'foo bar $var2', 'foo bar baz $var3', ); 

This should certainly contain the variables from interpolation when defining the array, but I cannot promise that they will interpolate when you actually use them. Anyway, try it.

-2
source share

All Articles