How can I efficiently handle multiple Perl search / replace operations on the same line?

So, my Perl script basically takes a string and then tries to clear it by doing a multiple search and replacing on it, for example:

$text =~ s/<[^>]+>/ /g;
$text =~ s/\s+/ /g;
$text =~ s/[\(\{\[]\d+[\(\{\[]/ /g;
$text =~ s/\s+[<>]+\s+/\. /g;
$text =~ s/\s+/ /g;
$text =~ s/\.*\s*[\*|\#]+\s*([A-Z\"])/\. $1/g; # replace . **** Begin or . #### Begin or ) *The 
$text =~ s/\.\s*\([^\)]*\) ([A-Z])/\. $1/g; # . (blah blah) S... => . S...

As you can see, I am dealing with unpleasant html and must surpass it in submission.

I hope there is a simpler, aesthetically appealing way to do this. I have about 50 lines that look the same as above.

I solved one version of this problem using a hash where the key is a comment and the hash is a reg expression, for example:

%rxcheck = (
        'time of day'=>'\d+:\d+', 
    'starts with capital letters then a capital word'=>'^([A-Z]+\s)+[A-Z][a-z]',
    'ends with a single capital letter'=>'\b[A-Z]\.'
}

And here is how I use it:

 foreach my $key (keys %rxcheck) {
if($snippet =~ /$rxcheck{ $key }/g){ blah blah  }
 }

The problem arises when I try to use a hash, when the key is an expression, and indicates that I want to replace with it ... and there is 1 or 2 dollars there.

%rxcheck2 = (
        '(\w) \"'=>'$1\"'
}

The above should do this:

$snippet =~ s/(\w) \"/$1\"/g;

"$ 1" ( , ... , $1, "" ). , :

if($snippet =~ /$key/$rxcheck2{ $key }/g){  }

.

, 2 :

: , , ?

: ( , , , , , , 1) , 2) 3) , 4) / ), ?

-

+5
3

№1

, , , , , . , , $text $_, , :

$text =~ s/foo/bar/g;

:

s/foo/bar/g;

for() :

for($text)
{
  s/foo/bar/g;
  s/qux/meh/g;
  ...
}

$_, local ize $_.

- , ?

( # 2), , , , , .. .

№ 2

qr// :

my $search = qr/(<[^>]+>)/;
$str =~ s/$search/foo,$1,bar/;

, "". , qr// , . :

1. eval() foreach.. %rxcheck2. : eval() s.

2. :

my @replacements = (
    sub { $_[0] =~ s/<[^>]+>/ /g; },
    sub { $_[0] =~ s/\s+/ /g; },
    sub { $_[0] =~ s/[\(\{\[]\d+[\(\{\[]/ /g; },
    sub { $_[0] =~ s/\s+[<>]+\s+/\. /g },
    sub { $_[0] =~ s/\s+/ /g; },
    sub { $_[0] =~ s/\.*\s*[\*|\#]+\s*([A-Z\"])/\. $1/g; },
    sub { $_[0] =~ s/\.\s*\([^\)]*\) ([A-Z])/\. $1/g; }
);

# Assume your data is in $_
foreach my $repl (@replacements) {
    &{$repl}($_);
}

, / ( -), .

+10

, . , , eval ( eval) :

#!/usr/bin/perl

use strict;
use warnings;

my @replace = (
    [ qr/(bar)/ => '"<$1>"' ],
    [ qr/foo/   => '"bar"'  ],
);

my $s = "foo bar baz foo bar baz";

for my $replace (@replace) {
    $s =~ s/$replace->[0]/$replace->[1]/gee;
}

print "$s\n";

, j_random_hacker . , /ee:

bar <bar> baz bar <bar> baz
bar <bar> baz bar <bar> baz
         Rate refs subs
refs  10288/s   -- -91%
subs 111348/s 982%   --

, :

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark;

my @subs = (
    sub { $_[0] =~ s/(bar)/<$1>/g },
    sub { $_[0] =~ s/foo/bar/g },
);

my @refs = (
    [ qr/(bar)/ => '"<$1>"' ],
    [ qr/foo/   => '"bar"'  ],
);

my %subs = (
    subs => sub {
        my $s = "foo bar baz foo bar baz";
        for my $sub (@subs) {
            $sub->($s);
        }
        return $s;
    },
    refs => sub {
        my $s = "foo bar baz foo bar baz";
        for my $ref (@refs) {
            $s =~ s/$ref->[0]/$ref->[1]/gee;
        }
        return $s;
    }
);

for my $sub (keys %subs) {
    print $subs{$sub}(), "\n";
}

Benchmark::cmpthese -1, \%subs;
+4

, HTML. , .

The right HTML parser will make your life easier. HTML :: Parser can be difficult to use, but there are other very useful CPAN libraries that I can recommend if you can indicate what you're trying to do, not how.

+4
source

All Articles