Why did this regular expression cause substcont an excessive number of times?

This is more out of curiosity than anything else, as I cannot find any useful information about Google about this function (CORE :: substcont)

In profiling and optimizing old, slow, syntactic XML code, I found that the following regular expression calls substcont 31 times each time a line is executed, and takes a huge amount of time:

Calls: 10000 Time: 2.65s Sub-elections: 320,000 Time in submarines: 1.15s`

$handle =~s/(>)\s*(<)/$1\n$2/g; # spent 1.09s making 310000 calls to main::CORE:substcont, avg 4µs/call # spent 58.8ms making 10000 calls to main::CORE:subst, avg 6µs/call 

Compared to the previous line:

Calls: 10,000 Time: 371ms Sub-conclusions: 30,000 Time in submarines: 221 ms

  $handle =~s/(.*)\s*(<\?)/$1\n$2/g; # spent 136ms making 10000 calls to main::CORE:subst, avg 14µs/call # spent 84.6ms making 20000 calls to main::CORE:substcont, avg 4µs/call 

The number of subscript calls is quite surprising, especially considering that I would have thought that the second regular expression would be more expensive. This is obviously why profiling is a good thing -)

I subsequently modified both of these lines to remove unnecessary backrefs, with sharp results for a poorly managed line:

Calls: 10000 Time: 393ms Sub-elections: 10000 Time in the submarine: 341 ms

 $handle =~s/>\s*</>\n</g; # spent 341ms making 10000 calls to main::CORE:subst, avg 34µs/call 
  • So my question is: why does the original make SO many calls to substcont, and what does substcont do even in the regex engine, which takes so long?
+7
optimization profiling regex perl
source share
1 answer

substcont is the internal Perl name for the "substitution iterator". Something related to s/// . Based on what little information I have, it seems that substcont triggered when doing backref. That is, when $1 present. You can play with him a bit using B :: Concise.

Here's the opcodes of a simple regular expression without backref.

 $ perl -MO=Concise,-exec -we'$foo = "foo"; $foo =~ s/(foo)/bar/ig' 1 <0> enter 2 <;> nextstate(main 1 -e:1) v:{ 3 <$> const[PV "foo"] s 4 <#> gvsv[*foo] s 5 <2> sassign vKS/2 6 <;> nextstate(main 1 -e:1) v:{ 7 <#> gvsv[*foo] s 8 <$> const[PV "bar"] s 9 </> subst(/"(foo)"/) vKS a <@> leave[1 ref] vKP/REFC -e syntax OK 

And one with.

 $ perl -MO=Concise,-exec -we'$foo = "foo"; $foo =~ s/(foo)/$1/ig' 1 <0> enter 2 <;> nextstate(main 1 -e:1) v:{ 3 <$> const[PV "foo"] s 4 <#> gvsv[*foo] s 5 <2> sassign vKS/2 6 <;> nextstate(main 1 -e:1) v:{ 7 <#> gvsv[*foo] s 8 </> subst(/"(foo)"/ replstart->9) vKS 9 <#> gvsv[*1] s a <|> substcont(other->8) sK/1 b <@> leave[1 ref] vKP/REFC -e syntax OK 

This is all what I can offer. You can try Rx , the mjd old regex debugger.

+4
source share

All Articles