How to extract optional query parameter using regex in javascript

I would like to create a regular expression that will check the path and foo parameters (non-negative integer). "foo" is optional. He must:

MATCH

path?foo=67 # path found, foo = 67 path?foo=67&bar=hello # path found, foo = 67 path?bar=bye&foo=1&baz=12 # path found, foo = 1 path?bar=123 # path found, foo = '' path # path found, foo = '' 

DO NOT SAVE

 path?foo=37signals # foo is not integer path?foo=-8 # foo cannot be negative something?foo=1 # path not found 

Also, I would like to get the foo value without doing an additional match.

What would be the simplest regular expression for this?

+8
javascript regex
source share
10 answers

Answer

Unscrew your hard work, I just want to answer! Ok, here you go ...

 var regex = /^path(?:(?=\?)(?:[?&]foo=(\d*)(?=[]|$)|(?![?&]foo=)[^#])+)?(?=#|$)/, URIs = [ 'path', // valid! 'pathbreak', // invalid path 'path?foo=123', // valid! 'path?foo=-123', // negative 'invalid?foo=1', // invalid path 'path?foo=123&bar=abc', // valid! 'path?bar=abc&foo=123', // valid! 'path?bar=foo', // valid! 'path?foo', // valid! 'path#anchor', // valid! 'path#foo=bar', // valid! 'path?foo=123#bar', // valid! 'path?foo=123abc', // not an integer ]; for(var i = 0; i < URIs.length; i++) { var URI = URIs[i], match = regex.exec(URI); if(match) { var foo = match[1] ? match[1] : 'null'; console.log(URI + ' matched, foo = ' + foo); } else { console.log(URI + ' is invalid...'); } } 
 <script src="https://getfirebug.com/firebug-lite-debug.js"></script> 

Study

In the request for the request, β€œreliable and / or official sources” are requested, so I will quote the RFC along the lines of the request .

The request component contains non-hierarchical data, which, together with the data in the path component (section 3.3), serves to identify the resource in the framework of the URI scheme and naming authority (if any). The request component is indicated by the symbol of the first question mark ("?") And ends with a license plate symbol ("#") or to the end of the URI.

This seems rather vague: does the query line begin with the first ? and ends with # (start of binding) or end of URI (or line / line in our case). They further mention that most datasets are in key=value pairs, which is similar to what you would expect from parsing (so let's assume this is the case).

However, since query components are often used to carry identifying information in the form of key-value pairs, and one frequently used value is a reference to another URI, it is sometimes better to use the ability to avoid percent encoding of these characters.

Given all this, let me suggest a few things about your URIs:

  • Your examples start with a path, so the path will start from the beginning of the line to ? (query string), # (anchor) or end of line.
  • The query string is part of iffy, since the RFC does not really define a "norm". The browser typically expects the query string to be generated from the form view and will be a list of key=value pairs added by & . Keeping this mentality:
    • The key cannot be null , is it preceded by ? or & and cannot contain = , & or # .
    • The value is optional; it is preceded by key= and cannot contain & or # .
  • Everything after the # symbol is an anchor.

To begin!

Let's start by mapping our basic URI structure . Do you have a path that is a character starting with a string, and up to ? , # or end of line. Do you have an optional query string that starts with ? and continues to # or the end of the line. And you have an extra anchor that starts with # and goes to the end of the line.

 ^ ([^?#]+) (?: \? ([^#]+) )? (?: # (.*) )? $ 

Let do some cleanup before delving into the query string. You can easily require that the path be equal to a specific value by replacing the first capture group. Regardless of the fact that you replace it ( path ), it must be followed by an optional query string, an optional anchor and the end of the string (no more, no less). Since you do not need to parse the binding, the capture group can be replaced by ending the match at either # or the end of the line (which is the end of the query parameter).

 ^path (?: \? ([^#\+) )? (?=#|$) 

Stop Messing Around

Well, I tuned a lot without worrying about your specific example. The following example will correspond to a certain path ( path ) and optionally correspond to the query string when capturing the value of the foo parameter. This means that you can stop here and check the compliance. If the match is valid, then the first capture group must be a null or non-negative integer. But that was not your question, was it. This has become much more complicated , so I will explain the inline expression:

 ^ (?# match beginning of the string) path (?# match path literally) (?: (?# begin optional non-capturing group) (?=\?) (?# lookahead for a literal ?) (?: (?# begin optional non-capturing group) [?&] (?# keys are preceded by ? or &) foo (?# match key literally) (?: (?# begin optional non-capturing group) = (?# values are preceded by =) ([^&#]*) (?# values are 0+ length and do not contain & or #) ) (?# end optional non-capturing group) | (?# OR) [^#] (?# query strings are non-# characters) )+ (?# end repeating non-capturing group) )? (?# end optional non-capturing group) (?=#|$) (?# lookahead for a literal # or end of the string) 

Some key outputs here:

  • Javascript does not support lookbehinds, that is, you can not look behind ? or & in front of the foo key, which means that you really need to match one of these characters, which means the beginning of your request, the string (which is looking for ? ) should be lookout, so you actually don't match ? . It also means that your query string will always be at least one character ( ? ), So you want to repeat the query string [^#] 1+ times.
  • The query line now repeats one character at a time in the group without capture .. if it does not see the key foo , in which case it captures an optional value and continues to be repeated.
  • Since this group of non-capture request strings is repeated all the way to the binding or end of the URI, the second value foo ( path?foo=123&foo=bar ) will overwrite the initial committed value. You may not be able to rely 100% on the above solution.

The final decision?

Good. Now I grabbed the foo value, time to kill the match for values ​​that are not positive integers .

 ^ (?# match beginning of the string) path (?# match path literally) (?: (?# begin optional non-capturing group) (?=\?) (?# lookahead for a literal ?) (?: (?# begin optional non-capturing group) [?&] (?# keys are preceeded by ? or &) foo (?# match key literally) = (?# values are preceeded by =) (\d*) (?# value must be a non-negative integer) (?= (?# begin lookahead) [&#] (?# literally match & or #) | (?# OR) $ (?# match end of the string) ) (?# end lookahead) | (?# OR) (?! (?# begin negative lookahead) [?&] (?# literally match ? or &) foo= (?# literally match foo=) ) (?# end negative lookahead) [^#] (?# query strings are non-# characters) )+ (?# end repeating non-capturing group) )? (?# end optional non-capturing group) (?=#|$) (?# lookahead for a literal # or end of the string) 

Let's take a closer look at some of the juju that went into this expression:

  • After finding foo=\d* we use lookahead to make sure that it is followed by & , # or the end of the line (the end of the query string value).
  • However, if the number foo=\d* greater, the regular expression will be discarded by the generator for a common [^#] match on the right on [?&] To foo . This is not good, because he will continue to match! Therefore, before looking for a common query string ( [^#] ), you should make sure that you are not looking at foo (which should be handled by the first rotation). It is useful to use a negative lookhhead (?![?&]foo=) .
  • This will work with multiple foo keys, since they will all have equal non-negative integers. This allows foo be optional (or equal to null ).

Denial of responsibility. Most Regex101 demos use PHP to improve syntax highlighting and include \n in negative character classes, as there are several lines of examples.

+17
source share

Good question! At first it seems pretty simple ... but there are many mistakes. Would advise that any declared solution would handle the following:

ADDITIONAL MATCH TESTS

 path? # path found, foo = '' path#foo # path found, foo = '' path#bar # path found, foo = '' path?foo= # path found, foo = '' path?bar=1&foo= # path found, foo = '' path?foo=&bar=1 # path found, foo = '' path?foo=1#bar # path found, foo = 1 path?foo=1&foo=2 # path found, foo = 2 path?foofoo=1 # path found, foo = '' path?bar=123&foofoo=1 # path found, foo = '' 

ADDITIONAL DO NOT TEST TESTS

 pathbar? # path not found pathbar?foo=1 # path not found pathbar?bar=123&foo=1 # path not found path?foo=a&foofoo=1 # not an integer path?foofoo=1&foo=a # not an integer 

The simplest regular expression that I could come up with works for all these extra cases:

 path(?=(\?|$|#))(\?(.+&)?foo=(\d*)(&|#|$)|((?![?&]foo=).)*$) 

However, we advise you to add ?: To unused capture groups so that they are ignored, and you can easily get the foo value from group 1 - see Debuggex Demo

 path(?=(?:\?|$|#))(?:\?(?:.+&)?foo=(\d*)(?:&|#|$)|(?:(?![?&]foo=).)*$) 

Regular expression visualization

+5
source share
 ^path\b(?!.*[?&]foo=(?!\d+(?=&|#|$)))(?:.*[?&]foo=(\d+)(?=&|#|$))? 

Basically, I just broke it into three parts.

 ^path\b # starts with path (?!.*[?&]foo=(?!\d+(?=&|#|$))) # not followed by foo with an invalid value (?:.*[?&]foo=(\d+)(?=&|#|$))? # possibly followed by foo with a valid value 

see validation here http://regexr.com/39i7g

Cautions:

will match path#bar=1&foo=27

will not match path?foo=

The OP does not mention these requirements, and since he wants a simple regular expression (oxymoron?), I have not tried to solve them.

+4
source share
 path.+?(?:foo=(\d+))(?![a-zA-Z\d])|path((?!foo).)*$ 

You can try this. Watch the demo.

http://regex101.com/r/jT3pG3/10

+2
source share

You can try the following regular expression:

 path(?:.*?foo=(\d+)\b|()(?!.*foo)) 

demo regex101

After path , two matches are possible:

.*?foo=(\d+)\b i.e. foo followed by numbers.

OR

()(?!.*foo) empty string if there is no foo in front.

Add some word boundaries ( \b ) if you do not want the regular expression to interpret other words (for example, another parameter named barfoobar ) around foo s.

 path(?:.*?\bfoo=(\d+)\b|()(?!.*\bfoo\b)) 
+2
source share

You can check for the presence of 3 rd matched groups. It is not there, the value of foo will be null ; otherwise this is the group itself:

 /^(path)(?:$|\?(?:(?=.*\b(foo=)(\d+)\b.*$)|(?!foo=).*?))/gm 

Example in regex101: http://regex101.com/r/oP6lU7/1

+1
source share

Working with the javascript mechanism for creating regular expressions, in addition to all the disadvantages that it has in comparison with PCRE, is somehow pleasant!

I made this RegEx simple and straightforward:

 ^(?=path\?).*foo=(\d*)(?:&|$)|path$ 

Explanation

 ^(?=path\?) # A positive lookahead to ensure we have "path" at the very begining .*foo=(\d*)(?:&|$) # Looking for a string includes foo=(zero or more digits) following a "&" character or end of string | # OR path$ # Just "path" itself 

Runnable snippet:

 var re = /^(?=path\?).*foo=(\d*)(?:&|$)|path$/gm; var str = 'path?foo=67\npath?foo=67&bar=hello\npath?bar=bye&foo=1&baz=12\npath\npathtest\npath?foo=37signals\npath?foo=-8\nsomething?foo=1'; var m, n = []; while ((m = re.exec(str)) != null) { if (m.index === re.lastIndex) { re.lastIndex++; } n.push(m[0]); } alert( JSON.stringify(n) ); 

Or a demo for more details.

+1
source share
 path(?:\?(?:[^&]*&)*foo=([0-9]+)(?:[&#]|$))? 

It is as small as most, and read more directly, since things that appear once in a line appear once in RE.

We match:

  • starting path
  • question mark (or skip to the end)
  • some blocks completed by ampersands
  • our parameter assignment
  • final confirmation, starting with the next syntax element or ending with a line

Unfortunately, it matches foo for None, not "when the foo parameter is omitted, but in Python (my choice language), which is considered more appropriate. You can complain if you want, either simply or with. ''

+1
source share

Based on OP data, here is my try pattern

 ^(path)\b(?:[^f]+|f(?!oo=))(?!\bfoo=(?!\d+\b))(?:\bfoo=(\d+)\b)? 

if the path is found: sub-template # 1 will contain the "path"
if foo is valid: sub-template # 2 will contain "foo value if any"

Demo

  • ^(path)\b "path"
  • (?:[^f]+|f(?!oo=)) followed by anything other than "foo ="
  • (?!\bfoo=(?!\d+\b)) , if "foo =" is found, it does not see anything except \d+\b
  • (?:\bfoo=(\d+)\b)? if a valid "foo =" is found, commit the value to "foo"
0
source share
 t = 'path?foo=67&bar=hello'; console.log(t.match(/\b(foo|path)\=\d+\b/)) 

regex /\b(foo|path)\=\d+\b/

-one
source share

All Articles