Your expressions are not equivalent
It:
$string=~s/^.+\///; $string=~s/\.shtml//;
replaces the text .shtml and all the way to the last slash.
It:
$string=~s/(^.+\/|\.shtml)//;
replaces the text .shtml or all, up to the last slash.
This is one problem with combining regular expressions: a complex complex regular expression is harder to write, harder to understand, and harder to debug than a few simple ones.
It probably doesn't matter which is faster
Even if your expressions are equivalent, using one or the other will probably not have a significant impact on the speed of your program. In-memory operations, such as s/// , are significantly faster than I / O files, and you indicated that you do a lot of file I / O.
You must profile your application with Devel :: NYTProf to see if these specific replacements are really a bottleneck (I doubt they exist). Do not waste time optimizing things that are already fast.
Alternatives discourage optimizer
Keep in mind that you are comparing apples and oranges, but if you are still interested in learning about performance, you can see how perl evaluates a certain regular expression using re pragma :
$ perl -Mre=debug -e'$_ = "foobar"; s/^.+\///; s/\.shtml//;' ... Guessing start of match in sv for REx "^.+/" against "foobar" Did not find floating substr "/"... Match rejected by optimizer Guessing start of match in sv for REx "\.shtml" against "foobar" Did not find anchored substr ".shtml"... Match rejected by optimizer Freeing REx: "^.+/" Freeing REx: "\.shtml"
The regex engine has an optimizer. The optimizer looks for substrings that should be displayed in the target string; if these substrings cannot be found, the match is not executed immediately without checking the other parts of the regular expression.
With /^.+\// optimizer knows that $string must contain at least one forward slash to match; when he does not find any slashes, he immediately rejects the match, without invoking the full mechanism of regular expressions. A similar optimization occurs with /\.shtml/ .
Here is what perl does with combined regex:
$ perl -Mre=debug -e'$_ = "foobar"; s/(?:^.+\/|\.shtml)//;' ... Matching REx "(?:^.+/|\.shtml)" against "foobar" 0 <> <foobar> | 1:BRANCH(7) 0 <> <foobar> | 2: BOL(3) 0 <> <foobar> | 3: PLUS(5) REG_ANY can match 6 times out of 2147483647... failed... 0 <> <foobar> | 7:BRANCH(11) 0 <> <foobar> | 8: EXACT <.shtml>(12) failed... BRANCH failed... 1 <f> <oobar> | 1:BRANCH(7) 1 <f> <oobar> | 2: BOL(3) failed... 1 <f> <oobar> | 7:BRANCH(11) 1 <f> <oobar> | 8: EXACT <.shtml>(12) failed... BRANCH failed... 2 <fo> <obar> | 1:BRANCH(7) 2 <fo> <obar> | 2: BOL(3) failed... 2 <fo> <obar> | 7:BRANCH(11) 2 <fo> <obar> | 8: EXACT <.shtml>(12) failed... BRANCH failed... 3 <foo> <bar> | 1:BRANCH(7) 3 <foo> <bar> | 2: BOL(3) failed... 3 <foo> <bar> | 7:BRANCH(11) 3 <foo> <bar> | 8: EXACT <.shtml>(12) failed... BRANCH failed... 4 <foob> <ar> | 1:BRANCH(7) 4 <foob> <ar> | 2: BOL(3) failed... 4 <foob> <ar> | 7:BRANCH(11) 4 <foob> <ar> | 8: EXACT <.shtml>(12) failed... BRANCH failed... 5 <fooba> <r> | 1:BRANCH(7) 5 <fooba> <r> | 2: BOL(3) failed... 5 <fooba> <r> | 7:BRANCH(11) 5 <fooba> <r> | 8: EXACT <.shtml>(12) failed... BRANCH failed... Match failed Freeing REx: "(?:^.+/|\.shtml)"
Notice how much longer the output. Due to interleaving, the optimizer does not start, and the full regular expression mechanism is executed. In the worst case (no match), each part of the rotation is checked for every character in the string. It is not very effective.
So alternation is slower, isn't it? No, because...
It depends on your data.
Again, we compare apples and oranges, but with:
$string = 'a/really_long_string';
combined regex can be faster because with s/\.shtml// optimizer needs to scan most of the string before rejecting the match, while combined regex will quickly match.
You can benchmark this for fun, but it is practically pointless, as you are comparing different things.