Regular expression matches spaces but skips sections

Question

Regular expression matches spaces but skips sections

I understand that since Regex is essentially stateless, it is quite difficult to achieve complex matches without resorting to supplementing the application logic, but I am curious to know if the following is possible.

Match all spaces, easy enough: \s+

But skip the spaces between specific delimiters, in my case ~~<pre> and </pre>~~ word nostrip .

Are there any tricks to achieve this? I thought of the lines of two separate matches, one for all spaces, and one for ~~<pre> blocked the~~ nostrip sections and somehow denied the last of the first.

 "This is some text NOSTRIP this is more text NOSTRIP some more text." // becomes "ThisissometextNOSTRIP this is more text NOSTRIPsomemoretext."

Nested tags nostrip partitions do n't matter, and I'm not trying to parse an HTML ~~tree~~ or anything else , just embellishing the text file , but keeping spaces in ~~<pre> blocks~~ nostrip partitions for obvious reasons.

(it is better?)

This is ultimately what I went with. I am sure that it can be optimized in several places, but now it works well.

 public function stripWhitespace($html, Array $skipTags = array('pre')){ foreach($skipTags as &$tag){ $tag = "<{$tag}.*?/{$tag}>"; } $skipped = array(); $buffer = preg_replace_callback('#(?<tag>' . implode('|', $skipTags) . ')#si', function($match) use(&$skipped){ $skipped[] = $match['tag']; return "\x1D" . (count($skipped) - 1) . "\x1D"; }, $html ); $buffer = preg_replace('#\s+#si', ' ', $buffer); $buffer = preg_replace('#(?:(?<=>)\s|\s(?=<))#si', '', $buffer); for($i = count($skipped) - 1; $i >= 0; $i--){ $buffer = str_replace("\x1D{$i}\x1D", $skipped[$i], $buffer); } return $buffer; }

+4

php regex preg-replace preg-match whitespace

Dan May 12, '11 at 20:46

source share

2 answers

I once created a set of functions to reduce spaces in html outputs:

 function minify($html) { if(empty($html)) { return $html; } $html = preg_replace('/^(.*)((<pre.*<\/pre>)(.*?))?$/Ues', "parse('$1').'$3'.minify('$4')", $html); return $html; } function parse($html) { var_dump('1'.$html); // Replace multiple spaces with a single space $html = preg_replace('/(\s+)/m', ' ', $html); // Remove spaces that are followed by either > or < $html = preg_replace('/ ([<>])/', '$1', $html); $html = str_replace('> ', '>', $html); return $html; } $html = minify($html);

You may need to tweak this a bit to fit your needs.

+1

Arjan May 12, '11 at 21:46

source share

Matt · Accepted Answer · 2011-05-12T21:41:40+0000

I use a scripting language, I would use a multi-stage approach.

pull out the NOSTRIP partitions and save them in an array and replace with markers (### or something else)
replace all spaces
reinsert all your saved NOSTRIP snippets

Regular expression matches spaces but skips sections

More articles: