Highlight the match result in the subject line from preg_match_all ()

I am trying to extract the subject line with the returned array $ matches from preg_match_all (). Let me start with an example:

preg_match_all("/(.)/", "abc", $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER); 

This will return:

 Array ( [0] => Array ( [0] => Array ( [0] => a [1] => 0 ) [1] => Array ( [0] => a [1] => 0 ) ) [1] => Array ( [0] => Array ( [0] => b [1] => 1 ) [1] => Array ( [0] => b [1] => 1 ) ) [2] => Array ( [0] => Array ( [0] => c [1] => 2 ) [1] => Array ( [0] => c [1] => 2 ) ) ) 

In this case, I want to highlight the total consumed data and each backlink.

The result should look like this:

 <span class="match0"> <span class="match1">a</span> </span> <span class="match0"> <span class="match1">b</span> </span> <span class="match0"> <span class="match1">c</span> </span> 

Another example:

 preg_match_all("/(abc)/", "abc", $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER); 

Must return:

 <span class="match0"><span class="match1">abc</span></span> 

Hope this is clear enough.

I want to highlight the total consumed data and highlight each backlink.

Thanks in advance. If something is unclear, please ask.

Note. It should not interrupt html. The regex string is AND and is unknown by code and is completely dynamic . Thus, the search string can be html, and the matched data can contain html-like text, and what not.

+4
source share
4 answers

This seems correct for all the examples that I sketched over it. Note that I broke the highlight part from the HTML-mangling part for reuse in other situations:

 <?php /** * Runs a regex against a string, and return a version of that string with matches highlighted * the outermost match is marked with [0]...[/0], the first sub-group with [1]...[/1] etc * * @param string $regex Regular expression ready to be passed to preg_match_all * @param string $input * @return string */ function highlight_regex_matches($regex, $input) { $matches = array(); preg_match_all($regex, $input, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER); // Arrange matches into groups based on their starting and ending offsets $matches_by_position = array(); foreach ( $matches as $sub_matches ) { foreach ( $sub_matches as $match_group => $match_data ) { $start_position = $match_data[1]; $end_position = $start_position + strlen($match_data[0]); $matches_by_position[$start_position]['START'][] = $match_group; $matches_by_position[$end_position]['END'][] = $match_group; } } // Now proceed through that array, annotoating the original string // Note that we have to pass through BACKWARDS, or we break the offset information $output = $input; krsort($matches_by_position); foreach ( $matches_by_position as $position => $matches ) { $insertion = ''; // First, assemble any ENDING groups, nested highest-group first if ( is_array($matches['END']) ) { krsort($matches['END']); foreach ( $matches['END'] as $ending_group ) { $insertion .= "[/$ending_group]"; } } // Then, any STARTING groups, nested lowest-group first if ( is_array($matches['START']) ) { ksort($matches['START']); foreach ( $matches['START'] as $starting_group ) { $insertion .= "[$starting_group]"; } } // Insert into output $output = substr_replace($output, $insertion, $position, 0); } return $output; } /** * Given a regex and a string containing unescaped HTML, return a blob of HTML * with the original string escaped, and matches highlighted using <span> tags * * @param string $regex Regular expression ready to be passed to preg_match_all * @param string $input * @return string HTML ready to display :) */ function highlight_regex_as_html($regex, $raw_html) { // Add the (deliberately non-HTML) highlight tokens $highlighted = highlight_regex_matches($regex, $raw_html); // Escape the HTML from the input $highlighted = htmlspecialchars($highlighted); // Substitute the match tokens with desired HTML $highlighted = preg_replace('#\[([0-9]+)\]#', '<span class="match\\1">', $highlighted); $highlighted = preg_replace('#\[/([0-9]+)\]#', '</span>', $highlighted); return $highlighted; } 

NOTE Since hakra pointed me to a chat if a subgroup in a regular expression can occur several times during one complete match (for example, '/ a (b | c) + /'), preg_match_all will tell you only about the last of these matches, therefore highlight_regex_matches('/a(b|c)+/', 'abc') returns '[0]ab[1]c[/1][/0]' not '[0]a[1]b[/1][1]c[/1][/0]' , as you might expect / want. All relevant groups outside this will still work correctly, therefore highlight_regex_matches('/a((b|c)+)/', 'abc') provides '[0]a[1]b[2]c[/2][/1][/0]' , which is still a good indicator of regular expression matching.

+3
source

Considering your comment in the first answer, I am sure that you did not formulate the question as planned. However, following what you ask for in concrete:

 $pattern = "/(.)/"; $subject = "abc"; $callback = function($matches) { if ($matches[0] !== $matches[1]) { throw new InvalidArgumentException( sprintf('you do not match thee requirements, go away: %s' , print_r($matches, 1)) ); } return sprintf('<span class="match0"><span class="match1">%s</span></span>' , htmlspecialchars($matches[1])); }; $result = preg_replace_callback($pattern, $callback, $subject); 

Before you start complaining, first take a look at where your flaw is in describing the problem. I have a feeling that you really want to analyze the result for the matches. However, you want to perform sub-matches. This does not work unless you also parse the regular expression to find out which groups are used. So far this is not so, and not in your question, and also not in this answer.

So, please, this example is for only one subgroup, which should also be an entire template as a requirement. In addition, it is fully dynamic.

on this topic:

0
source

I am not very familiar with the publication on stackoverflow, so I hope that I will not mess it up. I do this almost the same as @IMSoP, however, it is slightly different:

I store tags as follows:

 $tags[ $matched_pos ]['open'][$backref_nr] = "open tag"; $tags[ $matched_pos + $len ]['close'][$backref_nr] = "close tag"; 

As you can see, it is almost identical to @IMSoP.

Then I build a line like this, instead of pasting and sorting, as @IMSoP does:

 $finalStr = ""; for ($i = 0; $i <= strlen($text); $i++) { if (isset($tags[$i])) { foreach ($tags[$i] as $tag) { foreach ($tag as $span) { $finalStr .= $span; } } } $finalStr .= $text[$i]; } 

Where $text is the text used in preg_match_all()

I think my solution is a little faster than @IMSoP, as it should sort every time and what not. But I'm not sure.

My main concern right now is performance. But it may just be impossible to make it work faster than that?

I am trying to get the recursive preg_replace_callback() thing, but I have not been able to get it working so far. preg_replace_callback () seems very, very fast. Much faster than I'm doing right now.

0
source

Quick mashup, why use regex?

 $content = "abc"; $endcontent = ""; for($i = 0; $i > strlen($content); $i++) { $endcontent .= "<span class=\"match0\"><span class=\"match1\">" . $content[$i] . "</span></span>"; } echo $endcontent; 
-1
source

All Articles