You process each file five times, so the first thing you should do (as Paul Sunwald said) is to try to reduce this number by combining your regular expressions. I would also avoid using reluctant quantifiers, which are designed for convenience through efficiency. Consider this regular expression:
<script.*?</script>
Every time . moves to another character, first you need to make sure that </script> will not match in this place. It almost looks like a negative look at each position:
<script(?:(?!</script>).)*</script>
But we know that it makes no sense to lookahead if the next character is nothing but < , and we can adjust the regular expression accordingly:
<script[^<]*(?:<(?!/script>)[^<]*)*</script>
When I test them in RegexBuddy with this target string:
<script type="text/javascript">var imagePath='http://sstatic.net/stackoverflow/img/';</script>
... a non-reactive regular expression takes 173 steps to match, while a custom regular expression takes only 28.
Combining your first three regular expressions into one, you get this beast:
<(?:(script|style)[^<]*(?:<(?!/\1)[^<]*)*</\1>|[!/]?[a-zA-Z-]+[^<>]*>)
You may want to lock the <HEAD> element while you are on it (i.e. (script|style|head) ).
I donβt know what you are doing with the fourth regular expression, for character objects - are you just deleting them? I assume that the fifth regular expression should be run separately, as some of the spaces it clears are generated by the previous steps. But try it with the first three regular expressions and see how they differ. This should tell you whether to continue this approach.
source share