Select text with surrounding words

I want to select the text in a given line with the specified keywords and add a random number of surrounding words.

Example sentence:

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam unumy eirmod time invidunt ut labore et dolore magna aliquyam erat, sed.

Keyword example:

dolore magna

Desired result: (mark 0-4 words before and after the keyword

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam unumy eirmod invidunt ut labore et **dolore magna** aliquyam erat, sed .

What have i tried?

( [\w,\.-\?]+){0,5} ".$myKeyword." (.+ ){2,5} ( [\w,\.-\?]+){0,5} ".$myKeyword." (.+ ){2,5} as well as ([a-zA-Z,. ]+){1,3} ".$n." ([a-zA-Z,. ]+){1,3} ([a-zA-Z,. ]+){1,3} ".$n." ([a-zA-Z,. ]+){1,3}

Any ideas how to improve this and make it more reliable?

+7
php regex
source share
2 answers

To highlight, use the preg_replace function. Here's the idea: $s = "dolore magna";

 $str = preg_replace( '/\b(?>[\'\w-]+\W+){0,4}'.preg_quote($s, "/").'(?:\W+[\'\w-]+){0,4}/i', '<b>$0</b>', $str); 

Check the template in regex101 or php test on eval.in. echo $str;

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam unumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed .

Using the i flag to match without content - discard if you don't want to. The first group ?> Atomic for performance.

  • I used ['\w-] ( \w shorthand for the word character, ' and - ) as the word character
  • \w matches a character that is not a word character (negated \w )
  • \b matches the word boundary . Used to improve performance.
+3
source share

I think this will do what you need. Please check out the demo for an explanation of everything that regex does, or post a comment if you have a question.

Regex:

 ((?:[\w,.\-?]+\h){0,5})\b' . . '\b((?:.+\h){2,5}) 

Demo: https://regex101.com/r/vG8qT2/1

PHP:

 <?php $string = 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed.'; $term = 'dolore magna'; $min = 0; $max = 5; preg_match('~((?:[\w,.\-?]+\h){'.$min.','.$max. '})\b' . preg_quote($term) . '\b((?:.+\h){'.$min.','.$max.'})~', $string, $matches); print_r($matches); 

Demo: https://eval.in/410063

Please note that the resulting values ​​will be in $matches[1] and $matches[2] .

+2
source share

All Articles