Regex: check URL path without request parameters

I am not a regular expression expert, and I rack my brains trying to make what seems very simple, and works in python 2.7: checks the URL path (without hostname) without a query string. In other words, a line starting with / allows alphanumeric values ​​and does not allow any other special characters besides these: / . , -

I found this post , which is very similar to what I need, but it doesn’t work for me at all, I can test, for example, with aaa , and this will return true even if it does not start with / .

The current current regex that I have is as follows:

 [^/+a-zA-Z0-9.-] 

but it does not work with paths that do not start with / . For instance:

  • /aaa → true, this is normal
  • /aaa/bbb true, this is normal
  • /aaa?q=x → false, this is normal
  • aaa true, this is NOT normal
+6
source share
4 answers

The regular expression that you defined is a character class. Try instead:

 ^\/[/.a-zA-Z0-9-]+$ 
+3
source

In other words, a line that starts with / allows alphanumeric values ​​and does not allow any other special characters besides these: /,., -

You are missing some characters that are valid in URLs

 import string import urllib import urlparse valid_chars = string.letters + string.digits + '/.-~' valid_paths = [] urls = ['http://www.my.uni.edu/info/matriculation/enroling.html', 'http://info.my.org/AboutUs/Phonebook', 'http://www.library.my.town.va.us/Catalogue/76523471236%2Fwen44--4.98', 'http://www.my.org/462F4F2D4241522A314159265358979323846', 'http://www.myu.edu/org/admin/people#andy', 'http://www.w3.org/RDB/EMP?*%20where%20name%%3Ddobbins'] for i in urls: path = urllib.unquote(urlparse.urlparse(i).path) if path[0] == '/' and len([i for i in path if i in valid_chars]) == len(path): valid_paths.append(path) 
+3
source

Try the following:

^ (?: / [A-Za-Z0-9.- && [^ /]] *) + $

It seems to need work. See image: enter image description here

0
source

Try adding some more code. I can not understand how you use your regular expression from your question. Which bothers me, your re [^/+a-zA-Z0-9.-] basically says:

Match one character if it:

not a / or az (caps and lower both) or 0-9 or dot or dash

This makes no sense to me without knowing how you use it, since it matches only one character, not an entire string of URLs.

I'm not sure I understand why you can't start with / .

0
source

Source: https://habr.com/ru/post/927905/


All Articles