In Perl, when continuous string parsing is required, this can be done like this my $ string = "a 1 #";
while () { if ( $string =~ /\G\s+/gc ) { print "whitespace\n"; } elsif ( $string =~ /\G[0-9]+/gim ) { print "integer\n"; } elsif ( $string =~ /\G\w+/gim ) { print "word\n"; } else { print "done\n"; last; } }
Source: When is a useful \ G regular expression application?
It produces the following output:
whitespace word whitespace integer whitespace done
JavaScript (and many other regular expression flavors) has no \G pattern and no good replacement.
So, I came up with a very simple solution that serves my purpose.
//************************************************* // pattmatch - Makes the PAT pattern in ST from POS // notice the "^" use to simulate "/G" directive //************************************************* function pattmatch(st,pat,pos) { var resu; pat.lastIndex=0; if (pos===0) return pat.exec(st); // busca qualquer identificador else { resu = pat.exec(st.slice(pos)); // busca qualquer identificador if (resu) pat.lastIndex = pat.lastIndex + pos; return resu; } // if }
So, the above example would look like this in JavaScript ( node.js ):
<!-- language: lang-js --> var string = " a 1 # "; var pos=0, ret; var getLexema = new RegExp("^(\\s+)|([0-9]+)|(\\w+)","gim"); while (pos<string.length && ( ret = pm(string,getLexema,pos)) ) { if (ret[1]) console.log("whitespace"); if (ret[2]) console.log("integer"); if (ret[3]) console.log("word"); pos = getLexema.lastIndex; }
It produces the same output as the Perl code snippet:
whitespace word whitespace integer whitespace done
Note the parser stop with the # symbol. You can continue parsing in another piece of code from pos .
❖
Is there a better way in JavaScript to simulate a regex Perl /G pattern?
Postal version
For curiosity, I decided to compare my personal decision with @georg's suggestion. Here I do not indicate which code is better. For me it is a matter of taste.
Will my system, which will be highly dependent on user interaction, become slow?
@ikegami writes about @georg's solution:
... his solution adds a reduction in the number of times your input file is copied ...
So, I decided to compare both solutions in a loop that repeats the code 10 million times:
<!-- language: lang-js --> var i; var n1,n2; var string,pos,m,conta,re; // Mine code conta=0; n1 = Date.now(); for (i=0;i<10000000;i++) { string = " a 1 # "; pos=0, m; re = new RegExp("^(\\s+)|([0-9]+)|(\\w+)","gim"); while (pos<string.length && ( m = pattMatch(string,re,pos)) ) { if (m[1]) conta++; if (m[2]) conta++; if (m[3]) conta++; pos = re.lastIndex; } // While } n2 = Date.now(); console.log('Mine: ' , ((n2-n1)/1000).toFixed(2), ' segundos' ); // Other code conta=0; n1 = Date.now(); for (i=0;i<10000000;i++) { string = " a 1 # "; re = /^(?:(\s+)|([0-9]+)|(\w+))/i; while (m = string.match(re)) { if (m[1]) conta++; if (m[2]) conta++; if (m[3]) conta++; string = string.slice(m[0].length) } } n2 = Date.now(); console.log('Other: ' , ((n2-n1)/1000).toFixed(2) , ' segundos'); //************************************************* // pattmatch - Makes the PAT pattern in ST from POS // notice the "^" use to simulate "/G" directive //************************************************* function pattMatch(st,pat,pos) { var resu; pat.lastIndex=0; if (pos===0) return pat.exec(st); else { resu = pat.exec(st.slice(pos)); if (resu) pat.lastIndex = pat.lastIndex + pos; return resu; } } // pattMatch
Results:
Mine: 11.90 segundos
Other: 10.77 segundos
My code works 10% longer. It spends about 110 nanoseconds for iteration.
Honestly, according to my personal preferences, I accept this loss of efficiency as acceptable to me in a system with intensive user interaction.
If my project included heavy mathematical processing with multidimensional arrays or giant neural networks, I could rethink it.