Javascript Regular Expression Functions

I spent several hours on this, and I seem to be unable to figure it out.

In the code below, I'm trying to understand what and how regular expressions work in url.match .

Since the code is below, it does not work. However, if I delete (?:&toggle=|&ie=utf-8|&FORM=|&aq=|&x=|&gwp) , it seems to give me the result I want.

However, I do not want to delete this without understanding what he is doing.

I found a pretty useful resource, but after a few hours I still can’t determine exactly what these expressions do:

https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Regular_Expressions#Using_Parenthesized_Substring_Matches

Can someone break this for me and explain exactly how it parses the lines. The expressions themselves and the placement of parentheses are not entirely clear to me and, frankly, are very confusing.

Any help is appreciated.

 (function($) { $(document).ready(function() { function parse_keywords(url){ var matches = url.match(/.*(?:\?p=|\?q=|&q=|\?s=)([a-zA-Z0-9 +]*)(?:&toggle=|&ie=utf-8|&FORM=|&aq=|&x=|&gwp)/); return matches ? matches[1].split('+') : []; } myRefUrl = "http://www.google.com/url?sa=f&rct=j&url=https://www.mydomain.com/&q=my+keyword+from+google&ei=fUpnUaage8niAKeiICgCA&usg=AFQjCNFAlKg_w5pZzrhwopwgD12c_8z_23Q"; myk1 = (parse_keywords(myRefUrl)); kw=""; for (i=0;i<myk1.length;i++) { if (i == (myk1.length - 1)) { kw = kw + myk1[i]; } else { kw = kw + myk1[i] + '%20'; } } console.log (kw); if (kw != null && kw != "" && kw != " " && kw != "%20") { orighref = $('a#applynlink').attr('href'); $('a#applynlink').attr('href', orighref + '&scbi=' + kw); } }); })(jQuery); 
+4
source share
2 answers

Let me break this regex.

 / 

Start regex.

 .* 

Match zero or most - in principle, we are ready to match this regular expression anywhere on the line.

 (?:\?p= |\?q= |&q= |\?s=) 

In this case, ?: Means "do not write anything inside this group." See http://www.regular-expressions.info/refadv.html

\? literally mean ? , which is usually a symbol meaning "match 0 or 1 copies of the previous token", but we want to match the actual ?.

Other than that, it just looks for a lot of different options to choose from ( | means that the regular expression is valid if I match what is in front of me or after me.)

 ([a-zA-Z0-9 +]*) 

Now we match zero or more of any of the following characters in any layout: a-ZA-Z0-9 + And since it is inside () without ?: We fix it.

 (?:&toggle= |&ie=utf-8 |&FORM= |&aq= |&x= |&gwp) 

We see another ?: , So this is another not exciting group. Other than that, it is simply filled with literal characters separated by | s, so he does not make any fantasy of logic.

 / 

End regex.

In general, this regular expression scans a string for any instance of the first group that is not capturing, captures everything inside it, and then searches for any instance of the second group that is not capturing, and closes everything that was between the two groups not related to capturing. (Think of it as a “sandwich,” we look for the header and footer and grab anything that interests us)

After performing the regular expression, we do the following:

returns matches? [1] .split ('+'): [];

which captures the captured group and splits it into + into an array of strings.

+5
source

In such situations, it is very useful to visualize it using www.debuggex.com (which I created). It immediately shows the structure of your regular expression and allows you to go step by step.

In this case, the reason it works when you delete the last part of your regular expression is because none of the &toggle= , &ie=utf-8 lines, etc. URL is not specified in your example. To see this, drag the gray slider above the test line to debuggex and you will see that it never skips & in this last group.

+4
source

All Articles