MySQL regex for finding a double-slash internal URL in the body

We have a website using wordpress, and we found that at some point a failed plugin or user error added double slashes after the site (for example, http://example.site//category1/ or http://example.site/category1//category2/ etc.

This seems to work, but it seems that the results are not enough.

 SELECT id, post_content FROM `wp_posts` where post_content regexp '(href="[^"]*[^:]\/\/[^"]*)' and post_status in('draft','publish') order by id asc 

Is there a better way to do this? I do not want it to match the double slash that appears after http :, hence the negative match :.

Edit: for clarification, I want to find all the messages (body of the Wordpress post / page) that have the URL encoded on the page with a double slash, but do not match the double slashes after http :.

Regexp should match the following: http://example.site//category1/ or http://example.site/category1//category2/ or even http://example.site/category1/category2// or example.site/category1//category2/

But it should not coincide with the following: http://example.site/category1/ or http://example.site/category1/category2/

+4
source share
2 answers

Perhaps something like this will work.

 SELECT * FROM wp_posts WHERE CASE WHEN instr(post_content,'http://') > 0 THEN substring(post_content,7) regexp '\/\/' ELSE post_content regexp '\/\/' END 

And here is the SQL Fiddle .

Good luck.

+3
source

You can use:

 regexp '(https?:\/\/|www\.)[^ ]*\/\/' 

This will match the message if it contains http[s]:// or www. followed by non-spatial characters that have // .

See SQLFiddle (adapted from sgeddes script).

Or you can simply reduce your regular expression to '[^:]\/\/' and look for messages containing this.

0
source

All Articles