A better heading for a question might be: "Match a DIV
element containing a specific substring." First you need to say that regular expression is not the best tool for this job. It would be much better to use an HTML parser to parse markup, and then look for the contents of each DIV
element for the desired substring. However, since you no longer need to know how to use a regular expression to match something that is not something else, the following describes a limited way to do this with a regular expression.
As Dogbert correctly points out, this question is indeed a duplicate of a Regular expression to match a string containing no words? . However, I see that you have addressed this issue, but you must know how to apply this technique to the subpattern.
To match the part of the line (submatrix) that does not contain a specific word (or words), you need to apply a negative review approval check before each character. Here's how you could do this for text between opening and closing DIV
tags. Note that when using only one regular expression, since DIV
elements can be nested, it is wise to find "HELLO"
inside the "innermost" nested DIV
elements.
Pseudocode:
- Corresponds to an open
DIV
tag. - A lazy match with zero or more characters, each of which is not the beginning of a
<div
or </div
. - As soon as the required string is found:
"HELLO"
, go to it and match it. - Continue (greedily) a match with zero or more characters, each of which is not the beginning of a
<div
or </div
. - Corresponds to the closing tag
</div>
.
Note that to match only the βinnermostβ DIV
content, you must exclude both <div
and </div
while scanning the contents of the char element. Here is the corresponding regular expression in the form of a proven PHP function:
This function will correctly match the desired DIV element of the following test data:
<div>Bye.</div><div>Hello!</div>
It will also correctly find "HELLO" inside the innermost part of nested div elements:
<div> <div> Hello world! </div> </div>
But, as mentioned earlier, it will NOT find the string "HELLO" located inside the non-most inner nested DIV element, for example:
<div> Hello, <div> world! </div> </div>
Make it a lot harder.
There are many cases where this decision may fail. Again. I recommend using the HTML parser.