Regex: check URL path without request parameters

Question

Regex: check URL path without request parameters

I am not a regular expression expert, and I rack my brains trying to make what seems very simple, and works in python 2.7: checks the URL path (without hostname) without a query string. In other words, a line starting with / allows alphanumeric values and does not allow any other special characters besides these: / . , -

I found this post , which is very similar to what I need, but it doesn’t work for me at all, I can test, for example, with aaa , and this will return true even if it does not start with / .

The current current regex that I have is as follows:

 [^/+a-zA-Z0-9.-]

but it does not work with paths that do not start with / . For instance:

/aaa → true, this is normal
/aaa/bbb true, this is normal
/aaa?q=x → false, this is normal
aaa true, this is NOT normal

+6

python url regex path

Juancho Oct 17 '12 at 6:57

source share

4 answers

In other words, a line that starts with / allows alphanumeric values and does not allow any other special characters besides these: /,., -

You are missing some characters that are valid in URLs

 import string import urllib import urlparse valid_chars = string.letters + string.digits + '/.-~' valid_paths = [] urls = ['http://www.my.uni.edu/info/matriculation/enroling.html', 'http://info.my.org/AboutUs/Phonebook', 'http://www.library.my.town.va.us/Catalogue/76523471236%2Fwen44--4.98', 'http://www.my.org/462F4F2D4241522A314159265358979323846', 'http://www.myu.edu/org/admin/people#andy', 'http://www.w3.org/RDB/EMP?*%20where%20name%%3Ddobbins'] for i in urls: path = urllib.unquote(urlparse.urlparse(i).path) if path[0] == '/' and len([i for i in path if i in valid_chars]) == len(path): valid_paths.append(path)

+3

Burhan khalid Oct 17 '12 at 7:26

source share

Try the following:

^ (?: / [A-Za-Z0-9.- && [^ /]] *) + $

It seems to need work. See image:

0

Gábor Lipták Oct 17 '12 at 7:07

source share

Try adding some more code. I can not understand how you use your regular expression from your question. Which bothers me, your re [^/+a-zA-Z0-9.-] basically says:

Match one character if it:

not a / or az (caps and lower both) or 0-9 or dot or dash

This makes no sense to me without knowing how you use it, since it matches only one character, not an entire string of URLs.

I'm not sure I understand why you can't start with / .

0

Morten jensen Oct 17 '12 at 7:08

source share

Andrew Cheong · Accepted Answer · 2012-10-17T07:08:01+0000

The regular expression that you defined is a character class. Try instead:

 ^\/[/.a-zA-Z0-9-]+$

Regex: check URL path without request parameters

More articles: