A regular expression matching the open and close tag and the specific text patterns inside this tag

Here is an example of a custom tag that I have from sitemap.xml

<url> <loc>http://sitename.com/programming/php/?C=D;O=A</loc> <changefreq>weekly</changefreq> <priority>0.64</priority> </url> 

There are many such entries, and if you see the loc tag, it has c = d; 0 = a at the end. I want to delete all entries, starting with <url> ending with </url> , which contains C = D; 0 = A or similar patterns like this.

The following expression matches all of the above tag

 <url>(.|\r\n)*?<\/url> 

but I want to match what I indicated in the description above.

How to form a regular expression to match such conditions (patterns)?

+7
source share
3 answers

Try the following:

 /<url>(?:(?!<\/url>).)*C=D;O=A.*?<\/url>/m 

A negative outlook ensures that you do not agree with multiple nodes.

See here: rubular

+10
source

Using regex for XML is not recommended. Depending on the language, you should use some XML reader, extract the <url> node, and then use regex to match the contents of the node. One useful XML query language that is supported by many XML libraries is XPath .

+7
source

If you absolutely must use a regex, this is:

 <([az][a-z0-9]*)\b[^>]*>(.*?)(C=D;O=A){1}(.*?)</\1> 

you will get the line:

http://sitename.com/programming/php/?C=D;O=A

Then I will go to the parent tag and do whatever I want.

0
source

All Articles