Implementation # 1. Incorrect documentation
Source: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
It says here:
Line connector
... equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
However, when we try to use the "equivalent" template, it returns false:
String _R_ = "\\R"; System.out.println("\r\n".matches("((?<!"+_R_+")\\s)*")); // true // using "equivalent" pattern _R_ = "\\u000D\\u000A|[\\u000A\\u000B\\u000C\\u000D\\u0085\\u2028\\u2029]"; System.out.println("\r\n".matches("((?<!"+_R_+")\\s)*")); // false // now make it atomic, as per sln answer _R_ = "(?>"+_R_+")"; System.out.println("\r\n".matches("((?<!"+_R_+")\\s)*")); // true
So, Javadok must really say:
... is equivalent (?<!\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029])
March 9, 2017 Patch for Sherman on Oracle JDK-8176029 :
"api doc is NOT mistaken, the implementation is incorrect (which does not allow rollback" 0x0d + next.match () ", when" 0x0d + 0x0a + next.match () "does not work)"
Implementation # 2. Lookbehinds not only look back
Despite the name, lookbehind is not only capable of looking backward, but can even turn on and jump over the current position.
Consider the following example (from rexegg.com ):
"_12_".replaceAll("(?<=_(?=\\d{2}_))\\d+", "##"); // _
"This is interesting for several reasons: firstly, we have a look in search, and although we had to look back, this glance jumps to the current position, juxtaposing two numbers and the final underscore."
This means that for our example, \R is that even if our current position may be \n , it will not stop lookbehind from recognizing that \R follows it \n , and then binding the two together as an atomic group and therefore, refuse to recognize the \R part of the current position as a separate match.
Note: for simplicity, I used terms such as "our current position \n ", however this is not an accurate idea of ββwhat is going on inside.