Regular expression: search for two elements that do not surround another element in the text

I need to find poorly formatted HTML content from some text; we allow users to add strong and em tags, but they do not always correctly close them

 This is some <b>correct</b> formatting This is some <b>incorrect<b> formatting 

I would like to intercept instances where the formatting is wrong, i.e. when opening a tag, there should be no closing tag. I began to use negative views, but have not yet achieved much success

 <b>(?!.*?<\/b>.*?)<b> 
  • <b> Get opening tag
  • (?! negative forecast for
    • .*? anything but not greedily
    • <\/b> closing tag
    • .*? anything but not greedily
  • ) close view
  • <b> Another opening tag

Any idea how I could do this?

Addendum : I know about Tony Pony, but I feel that this is not coming soon. This problem can be replaced by "I want to find two occurrences of the word" zoinx ", where there is no word" palantir "in between, which is not related to HTML

+5
source share
1 answer
 <b>(?:(?!<\/b>).)*<b> 

Try it. Check out the demo.

https://regex101.com/r/nS2lT4/19

For a generic version, use

 <([^>]*)>(?:(?!<\/\1>).)*<\1> 

See the demo.

https://regex101.com/r/nS2lT4/24

+3
source

All Articles