How to match a string that does not contain a word?

To match a string containing some word, I can use the pattern "/.*word.*/". But how can I match a string that does not contain this word?

Example:

I need to find a substring in a large text that is enclosed in two tags and has the string "Hello" inside the string. The best I came up with:

"@<div>(.*?Hello.?*)</div>@i" 

But it will also follow the sequence:

 <div>Bye.</div><div>Hello!</div> 

And I don't want to match the first pair of div tags, so I want to replace ". *?" with something like "match any string except one that doesn't contain".

Test case :

For the input line:

 <div>Bye.</div><div>Hello!</div> 

I need to catch

 <div>Hello!</div> 
+4
source share
3 answers

A better heading for a question might be: "Match a DIV element containing a specific substring." First you need to say that regular expression is not the best tool for this job. It would be much better to use an HTML parser to parse markup, and then look for the contents of each DIV element for the desired substring. However, since you no longer need to know how to use a regular expression to match something that is not something else, the following describes a limited way to do this with a regular expression.

As Dogbert correctly points out, this question is indeed a duplicate of a Regular expression to match a string containing no words? . However, I see that you have addressed this issue, but you must know how to apply this technique to the subpattern.

To match the part of the line (submatrix) that does not contain a specific word (or words), you need to apply a negative review approval check before each character. Here's how you could do this for text between opening and closing DIV tags. Note that when using only one regular expression, since DIV elements can be nested, it is wise to find "HELLO" inside the "innermost" nested DIV elements.

Pseudocode:

  • Corresponds to an open DIV tag.
  • A lazy match with zero or more characters, each of which is not the beginning of a <div or </div .
  • As soon as the required string is found: "HELLO" , go to it and match it.
  • Continue (greedily) a match with zero or more characters, each of which is not the beginning of a <div or </div .
  • Corresponds to the closing tag </div> .

Note that to match only the β€œinnermost” DIV content, you must exclude both <div and </div while scanning the contents of the char element. Here is the corresponding regular expression in the form of a proven PHP function:

 // Find an innermost DIV element containing the string "HELLO". function p1($text) { $re = '% # Match innermost DIV element containing "HELLO" <div[^>]*> # DIV element start tag. (?: # Group to match contents up to "HELLO". (?!</?div\b) # Assert this char is not start of DIV tag. . # Safe to match this non-DIV-tag char. )*? # Lazily match contents one chara at a time. \bhello\b # Match target "HELLO" word inside DIV. (?: # Group to match content following "HELLO". (?!</?div\b) # Assert this char is not start of DIV tag. . # Safe to match this non-DIV-tag char. )* # Greedily match contents one chara at a time. </div> # DIV element end tag. %six'; if (preg_match($re, $text, $matches)) { // Match found. return $matches[0]; } else { // No match found return 'no-match'; } } 

This function will correctly match the desired DIV element of the following test data:

 <div>Bye.</div><div>Hello!</div> 

It will also correctly find "HELLO" inside the innermost part of nested div elements:

 <div> <div> Hello world! </div> </div> 

But, as mentioned earlier, it will NOT find the string "HELLO" located inside the non-most inner nested DIV element, for example:

 <div> Hello, <div> world! </div> </div> 

Make it a lot harder.

There are many cases where this decision may fail. Again. I recommend using the HTML parser.

+4
source
 '~<div>(?!.*?Bye\..*?</div>).+?</div>~' 
+3
source

Can't you just check if you got a match?

If you are looking for anything other than the word "word":

 if(!preg_match("/word/i", $myString)) 

This will run the code under if only if the "word" was not found.

0
source

All Articles