Using capture groups in String.split ()
$ node > "ababaabab".split(/a{2}/) [ 'abab', 'bab' ] > "ababaabab".split(/(a){2}/) [ 'abab', 'a', 'bab' ] > So that doesn't make sense to me. Can someone explain this? I do not understand why 'a' appears.
Note. I am trying to match for double terminated strings (possibly in Windows files), so I am splitting into /(\r?\n){2}/ . However, I get extraneous entries '\015\n' in my array (note \015 == \r ).
Why do they appear?
Note. It also affects the JS engine in browsers, so this is specific to non-node JS.
In the second result, a appears because you wrapped it in a capture group () (parentheses).
If you want not to include it, but you still need a conditional group, use a non-capture group: (?:a) . A colon question mark can be used within any capture group, and it will be omitted from the resulting capture list.
Here is a simple example of this in action: http://regex101.com/r/yM1vM4
Since {2} is outside the captured brackets, I assume that it breaks into 2 characters, but only captures the first.
If you move {2} in brackets:
"ababaabab".split(/(a{2})/) then you get
["abab", "aa", "bab"] If you do not want "aa", do not group it in parentheses. i.e.
"ababaabab".split(/a{2}/) gives
["abab", "bab"] According to ECMA :
String.prototype.split (delimiter, limit)
If the delimiter is a regular expression containing brackets in parentheses, then each time the delimiter is matched, the results (including any undefined results) of the sliding parentheses are combined into an output array.
The given example:
"ababaabab".split(/(a){2}/) // [ "abab", "a", "bab" ] split occurs on aa , but only "a" is in capture group (a) , so this is what is spliced ββinto the output array.
Other examples:
"ababaaxaabab".split(/(a){2}/) // ["abab", "a", "x", "a", "bab"] "ababaaxaabab".split(/(aa)/) // ["abab", "aa", "x", "aa", "bab"] split saves capture groups. That is why you see it as a result.
View the description and copy the parentheses:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split
In regular expressions, () denotes a capture group. In order not to capture it, use a group that does not capture it (?:) .