There were some great solutions here, but they were not ideal for extracting parts of the code, say, from HTML, which was my problem right now, since I need to extract script blocks from HTML before compressing HTML. So, based on the original @ raina77ow solution extended by @Cas Tuyn, I get the following:
$test_strings = [ '0<p>a</p>1<p>b</p>2<p>c</p>3', '0<p>a</p>1<p>b</p>2<p>c</p>', '<p>a</p>1<p>b</p>2<p>c</p>3', '<p>a</p>1<p>b</p>2<p>c</p>', '<p></p>1<p>b' ]; function getDelimitedStrings($str, $startDelimiter, $endDelimiter) { $contents = array(); $startDelimiterLength = strlen($startDelimiter); $endDelimiterLength = strlen($endDelimiter); $startFrom = $contentStart = $contentEnd = $outStart = $outEnd = 0; while (false !== ($contentStart = strpos($str, $startDelimiter, $startFrom))) { $contentStart += $startDelimiterLength; $contentEnd = strpos($str, $endDelimiter, $contentStart); $outEnd = $contentStart - 1; if (false === $contentEnd) { break; } $contents['in'][] = substr($str, ($contentStart-$startDelimiterLength), ($contentEnd + ($startDelimiterLength*2) +1) - $contentStart); if( $outStart ){ $contents['out'][] = substr($str, ($outStart+$startDelimiterLength+1), $outEnd - $outStart - ($startDelimiterLength*2)); } else if( ($outEnd - $outStart - ($startDelimiterLength-1)) > 0 ){ $contents['out'][] = substr($str, $outStart, $outEnd - $outStart - ($startDelimiterLength-1)); } $startFrom = $contentEnd + $endDelimiterLength; $startFrom = $contentEnd; $outStart = $startFrom; } $total_length = strlen($str); $current_position = $outStart + $startDelimiterLength + 1; if( $current_position < $total_length ) $contents['out'][] = substr($str, $current_position); return $contents; } foreach($test_strings AS $string){ var_dump( getDelimitedStrings($string, '<p>', '</p>') ); }
This will extract everything
wlements with possible innerHTML aswell, giving this result:
array (size=2) 'in' => array (size=3) 0 => string '<p>a</p>' (length=8) 1 => string '<p>b</p>' (length=8) 2 => string '<p>c</p>' (length=8) 'out' => array (size=4) 0 => string '0' (length=1) 1 => string '1' (length=1) 2 => string '2' (length=1) 3 => string '3' (length=1) array (size=2) 'in' => array (size=3) 0 => string '<p>a</p>' (length=8) 1 => string '<p>b</p>' (length=8) 2 => string '<p>c</p>' (length=8) 'out' => array (size=3) 0 => string '0' (length=1) 1 => string '1' (length=1) 2 => string '2' (length=1) array (size=2) 'in' => array (size=3) 0 => string '<p>a</p>' (length=8) 1 => string '<p>b</p>' (length=8) 2 => string '<p>c</p>' (length=8) 'out' => array (size=3) 0 => string '1' (length=1) 1 => string '2' (length=1) 2 => string '3' (length=1) array (size=2) 'in' => array (size=3) 0 => string '<p>a</p>' (length=8) 1 => string '<p>b</p>' (length=8) 2 => string '<p>c</p>' (length=8) 'out' => array (size=2) 0 => string '1' (length=1) 1 => string '2' (length=1) array (size=2) 'in' => array (size=1) 0 => string '<p></p>' (length=7) 'out' => array (size=1) 0 => string '1<p>b' (length=5)
You can see the demo here: 3v4l.org/TQLmn
Kim Steinhaug May 22 '19 at 21:56 2019-05-22 22:56
source share