How to overcome the lack of Perl \ G in JavaScript code?

In Perl, when continuous string parsing is required, this can be done like this my $ string = "a 1 #";

while () { if ( $string =~ /\G\s+/gc ) { print "whitespace\n"; } elsif ( $string =~ /\G[0-9]+/gim ) { print "integer\n"; } elsif ( $string =~ /\G\w+/gim ) { print "word\n"; } else { print "done\n"; last; } } 

Source: When is a useful \ G regular expression application?

It produces the following output:

 whitespace word whitespace integer whitespace done 

JavaScript (and many other regular expression flavors) has no \G pattern and no good replacement.

So, I came up with a very simple solution that serves my purpose.

 <!-- language: lang-js --> //************************************************* // pattmatch - Makes the PAT pattern in ST from POS // notice the "^" use to simulate "/G" directive //************************************************* function pattmatch(st,pat,pos) { var resu; pat.lastIndex=0; if (pos===0) return pat.exec(st); // busca qualquer identificador else { resu = pat.exec(st.slice(pos)); // busca qualquer identificador if (resu) pat.lastIndex = pat.lastIndex + pos; return resu; } // if } 

So, the above example would look like this in JavaScript ( node.js ):

 <!-- language: lang-js --> var string = " a 1 # "; var pos=0, ret; var getLexema = new RegExp("^(\\s+)|([0-9]+)|(\\w+)","gim"); while (pos<string.length && ( ret = pm(string,getLexema,pos)) ) { if (ret[1]) console.log("whitespace"); if (ret[2]) console.log("integer"); if (ret[3]) console.log("word"); pos = getLexema.lastIndex; } // While console.log("done"); 

It produces the same output as the Perl code snippet:

 whitespace word whitespace integer whitespace done 

Note the parser stop with the # symbol. You can continue parsing in another piece of code from pos .

Is there a better way in JavaScript to simulate a regex Perl /G pattern?

Postal version

For curiosity, I decided to compare my personal decision with @georg's suggestion. Here I do not indicate which code is better. For me it is a matter of taste.

Will my system, which will be highly dependent on user interaction, become slow?

@ikegami writes about @georg's solution:

... his solution adds a reduction in the number of times your input file is copied ...

So, I decided to compare both solutions in a loop that repeats the code 10 million times:

 <!-- language: lang-js --> var i; var n1,n2; var string,pos,m,conta,re; // Mine code conta=0; n1 = Date.now(); for (i=0;i<10000000;i++) { string = " a 1 # "; pos=0, m; re = new RegExp("^(\\s+)|([0-9]+)|(\\w+)","gim"); while (pos<string.length && ( m = pattMatch(string,re,pos)) ) { if (m[1]) conta++; if (m[2]) conta++; if (m[3]) conta++; pos = re.lastIndex; } // While } n2 = Date.now(); console.log('Mine: ' , ((n2-n1)/1000).toFixed(2), ' segundos' ); // Other code conta=0; n1 = Date.now(); for (i=0;i<10000000;i++) { string = " a 1 # "; re = /^(?:(\s+)|([0-9]+)|(\w+))/i; while (m = string.match(re)) { if (m[1]) conta++; if (m[2]) conta++; if (m[3]) conta++; string = string.slice(m[0].length) } } n2 = Date.now(); console.log('Other: ' , ((n2-n1)/1000).toFixed(2) , ' segundos'); //************************************************* // pattmatch - Makes the PAT pattern in ST from POS // notice the "^" use to simulate "/G" directive //************************************************* function pattMatch(st,pat,pos) { var resu; pat.lastIndex=0; if (pos===0) return pat.exec(st); else { resu = pat.exec(st.slice(pos)); if (resu) pat.lastIndex = pat.lastIndex + pos; return resu; } } // pattMatch 

Results:

Mine: 11.90 segundos
Other: 10.77 segundos

My code works 10% longer. It spends about 110 nanoseconds for iteration.

Honestly, according to my personal preferences, I accept this loss of efficiency as acceptable to me in a system with intensive user interaction.

If my project included heavy mathematical processing with multidimensional arrays or giant neural networks, I could rethink it.

+7
javascript regex perl parsing
source share
2 answers

Functionality \G exists in the form of /y flag .

 var regex = /^foo/y; regex.lastIndex = 2; regex.test('..foo'); // false - index 2 is not the beginning of the string var regex2 = /^foo/my; regex2.lastIndex = 2; regex2.test('..foo'); // false - index 2 is not the beginning of the string or line regex2.lastIndex = 2; regex2.test('.\nfoo'); // true - index 2 is the beginning of a line 

But this is brand new. You will not be able to use it on public websites. Check the browser compatibility chart in the related documentation.

+4
source share

It sounds like you are embarrassing him a little. exec with the g flag provides commit out of the box:

 var string = " a 1 # ", re = /(\s+)|([0-9]+)|(\w+)|([\s\S])/gi, m; while (m = re.exec(string)) { if (m[1]) console.log('space'); if (m[2]) console.log('int'); if (m[3]) console.log('word'); if (m[4]) console.log('unknown'); } 

If your regular expression does not close and you want to stop at the first mismatch, the easiest way is to match with ^ and split the line after matching:

  var string = " a 1 # ", re = /^(?:(\s+)|([0-9]+)|(\w+))/i, m; while (m = string.match(re)) { if (m[1]) console.log('space'); if (m[2]) console.log('int'); if (m[3]) console.log('word'); string = string.slice(m[0].length) } console.log('done, rest=[%s]', string) 

This simple method does not completely replace \G (or your "match from" method), since it loses the left matching context.

+2
source share

All Articles