Perl regular expression substitution using external parameters

Consider the following example:

my $text = "some_strange_thing"; $text =~ s/some_(\w+)_thing/no_$1_stuff/; print "Result: $text\n"; 

He is typing

"Result: no_ weird _stuff"

So far so good.

Now I need to get match patterns and replacements from external sources (user input, configuration file, etc.). The naive solution is as follows:

 my $match = "some_(\\w+)_thing"; my $repl = "no_\$1_stuff"; my $text = "some_strange_thing"; $text =~ s/$match/$repl/; print "Result: $text\n"; 

But:

"Result: no_ $ 1 _stuff".

What happened? How can I get the same result with templates supplied externally?

+6
source share
2 answers

Solution 1: String::Substitution

Use the String::Substitution package :

 use String::Substitution qw(gsub_modify); my $find = 'some_(\w+)_thing'; my $repl = 'no_$1_stuff'; my $text = "some_strange_thing"; gsub_modify($text, $find, $repl); print $text,"\n"; 

The replacement string only interpolates (the term is used loosely) numbered matching variables (for example, $1 or ${12} ). See "interpolate_match_vars" for more information.
This module does not save or interpolate $& to avoid "significant performance degradation " (see Perlvar ).

Solution 2: Data::Munge

This is the solution mentioned by Grinnz in the comments below.

Data::Munge can be used as follows:

 use Data::Munge; my $find = qr/some_(\w+)_thing/; my $repl = 'no_$1_stuff'; my $text = 'some_strange_thing'; my $flags = 'g'; print replace($text, $find, $repl, $flags); # => no_strange_stuff 

Quick and dirty way (if the replacement does not contain double quotes and security is not considered)

DISCLAIMER : I provide this solution, as this approach can be found on the Internet, but its warnings are not explained. Do not use it in production .

With this approach, you cannot have a replacement string that contains " double quotes”, and since this is equivalent to passing anyone who writes direct access to the configuration file to the configuration file, it should not be open to web users (as mentioned by Daniel ) Martin ).

You can use the following code:

 #!/usr/bin/perl my $match = qr"some_(\w+)_thing"; my $repl = '"no_$1_stuff"'; my $text = "some_strange_thing"; $text =~ s/$match/$repl/ee; print "Result: $text\n"; 

Watch the IDEONE Demo

Result:

 Result: no_strange_stuff 

you should

  1. Declare a replacement in '"..."' so you can evaluate $1 later
  2. Use /ee to force a double evaluation of the variables in the replacement.

The modifier, available specifically for search and replace, is the s///e rating modifier. s///e treats the replacement text as Perl code, not as a double-quoted string. The value that the code returns is replaced with the corresponding substring. s///e is useful if you need to calculate a little while replacing text.

You can use qr to instantiate a pattern for a regular expression ( qr"some_(\w+)_thing" ).

+9
source

Essentially the same approach as the decision made, but I kept the original lines in the same way as the problem statement, as I thought this could make it easier to embed in more situations:

 my $match = "some_(\\w+)_thing"; my $repl = "no_\$1_stuff"; my $qrmatch = qr($match); my $code = $repl; $code =~ s/([^"\\]*)(["\\])/$1\\$2/g; $code = qq["$code"]; if (!defined($code)) { die "Couldn't find appropriate quote marks"; } my $text = "some_strange_thing"; $text =~ s/$qrmatch/$code/ee; print "Result: $text\n"; 

Note that this works no matter what is in $repl , whereas a naive solution has problems if $repl contains the double quote character itself or ends with a backslash.

Also, assuming you're going to run three lines at the end (or something like this) in a loop, make sure you don't skip the qr line. This will make a huge difference in performance if you skip qr and just use s/$match/$code/ee .

In addition, although it is not so easy to get arbitrary code execution with this solution, as with the accepted one, it will not surprise me if it is still possible. In general, I would avoid s///ee -based solutions if $match or $repl comes from untrusted users. (e.g. do not create a web service from this)

A safe replacement of this kind, when $match and $repl supplied by untrusted users, you should ask another question if your use case includes this.

+2
source

All Articles