(.*) means that you are dealing with any number of repetitions of “SCF SF” before you find the one that indicates the next capture, making it inanimate, you are still processing the possibilities that even “SCF SF 'will appear in the capture after 'FF.' I think you handle a lot of cases that you don't need.
The best way to optimize a regular expression sometimes makes it more cryptic - but you definitely find ways to make the expression fail earlier. (.*?) without being "greedy" is definitely too tolerant.
Below is a more detailed, but faster alternative to the second capture.
((?:[^S]|S[^C]|SC[^F]|SCF[^ ]|SCF [^S]|SCF S[^F])*)
But you can optimize it even more if you think that the line \bSCF\b should automatically commit and expect only "\ bSCF SF \ b". So you can rewrite this as:
((?:[^S]|S[^C]|SC[^F]SCF\B)*) SCF SF
But you can optimize these lines even more by controlling backtracking. If you think that in the world there is no way for SCF to ever appear as a word, and it is not followed by SF at the actual input. To do this, you add another group to it with brackets (?> And ) .
(?>((?:[^S]|S[^C]|SC[^F]SCF\B)*)) SCF SF
This means that the logic of correspondence will in no way try to overestimate what it has captured. If the characters after that are not “SCF SF”, the whole expression stops working. And that ends long before he ever tries to place MV and other subexpressions.
In fact, given certain expressions about the uniqueness of delimiters, the highest performance for this expression will be:
$text_normal = qr{^(\/F\d+) FF (?>((?:[^S]|S[^C]|SC[^F]SCF\B)*))SCF SF (?>((?:[^M]|M[^V]|MV\B)*))MV (?>(\((?:[^S]|S[^H]|SH.)*))SH$};
In addition, detailed, exhaustive negative matches may be alternative, pronounced negative images - but I don't know how this works for performance. But a negative look will work as follows:
((?:.(?! SCF))*) SCF SF
This means that for this capture, I need any character that is not a space starting with the string "SCF SF".