"ababaabab".split(/(a){2}/) [ 'abab', 'a', 'bab' ] > ...">

Using capture groups in String.split ()

$ node > "ababaabab".split(/a{2}/) [ 'abab', 'bab' ] > "ababaabab".split(/(a){2}/) [ 'abab', 'a', 'bab' ] > 

So that doesn't make sense to me. Can someone explain this? I do not understand why 'a' appears.

Note. I am trying to match for double terminated strings (possibly in Windows files), so I am splitting into /(\r?\n){2}/ . However, I get extraneous entries '\015\n' in my array (note \015 == \r ).

Why do they appear?

Note. It also affects the JS engine in browsers, so this is specific to non-node JS.

+7
javascript regex
source share
5 answers

In the second result, a appears because you wrapped it in a capture group () (parentheses).

If you want not to include it, but you still need a conditional group, use a non-capture group: (?:a) . A colon question mark can be used within any capture group, and it will be omitted from the resulting capture list.

Here is a simple example of this in action: http://regex101.com/r/yM1vM4

+9
source share

Since {2} is outside the captured brackets, I assume that it breaks into 2 characters, but only captures the first.

If you move {2} in brackets:

 "ababaabab".split(/(a{2})/) 

then you get

 ["abab", "aa", "bab"] 

If you do not want "aa", do not group it in parentheses. i.e.

 "ababaabab".split(/a{2}/) 

gives

 ["abab", "bab"] 
+2
source share

According to ECMA :

String.prototype.split (delimiter, limit)

If the delimiter is a regular expression containing brackets in parentheses, then each time the delimiter is matched, the results (including any undefined results) of the sliding parentheses are combined into an output array.

The given example:

 "ababaabab".split(/(a){2}/) // [ "abab", "a", "bab" ] 

split occurs on aa , but only "a" is in capture group (a) , so this is what is spliced ​​into the output array.

Other examples:

 "ababaaxaabab".split(/(a){2}/) // ["abab", "a", "x", "a", "bab"] "ababaaxaabab".split(/(aa)/) // ["abab", "aa", "x", "aa", "bab"] 
+2
source share

split saves capture groups. That is why you see it as a result.

View the description and copy the parentheses:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split

+1
source share

In regular expressions, () denotes a capture group. In order not to capture it, use a group that does not capture it (?:) .

0
source share

All Articles