What is really going on
It seems that + , followed by a range quantifier, does not offer the property of attraction to the range quantifier. Rather, it is considered as something that used to be repeated one or more times. Using .{1,3}+b as an example, it will be equivalent to (?:.{1,3})+b .
Work around
You can get around this with a more general build group without backtracking (or atomic grouping) (?>pattern) . Let's use the general case of pattern{n,m}+ as an example to create an equivalent regular expression with a group without backtracking (equivalent to Java behavior when matching with pattern{n,m}+ ):
(?>(?>pattern){n,m})
Why are there 2 levels of groups without backtracking? 2 are necessary because:
- If a match is found for
pattern (one repeat instance), rollback to pattern prohibited. (Note that until the instance is found, backtracking within the pattern allowed). It is emulated with an internal group without reverse processing. - If no more instances of
pattern are found, backtracking is canceled to remove any of the instances. It is emulated with an external group without backtracking.
I am not sure if there is any caution here. Please email me with a comment if you find any case not emulated using this method.
Testing
Test 1
First I tested this regex:
(.{1,3}+)b
I initially tested without a capture group, but the result was so unexpected that I needed a capture group to confirm what was happening.
At this input:
2343333ab
As a result, the entire line corresponds , and the capture group is 2343333a (without end b at the end). This shows that the upper limit was somehow violated.
DEMO in rubular
Test 2
This second test shows how the behavior of the range quantifiers {n} cannot be modified to be possessive, and it is likely that this also applies to other range quantifiers {n,} and {n,m} . Instead, the next + will only display a repeat of 1 or more temporary behavior.
(My initial conclusion is that + overwrites the upper limit, but it turns out to be wrong).
Regular expression:
(.{3}+)b
Input line:
23d4344333ab 234344333ab 23434433ab
Matches captured in capture group 1 are all multiples of 3. From top to bottom, the regular expression skips 2, 1, 0 characters respectively for input lines.
An input line with an annotation ( [] denotes a match for the entire regular expression, () denotes the text captured by capture group 1):
23[(d4344333a)b] 2[(34344333a)b] [(23434433a)b]
DEMO in rubular
Testing code to work
This is test code in Java to show that both external and internal groups are needed without backtracking. ideone
class TestPossessive { public static void main(String args[]) { String inputText = "123456789012"; System.out.println("Input string: " + inputText); System.out.println("Expected: " + inputText.replaceFirst("(?:\\d{3,4}(?![89])){2,}+", ">$0<")); System.out.println("Outer possessive group: " + inputText.replaceFirst("(?>(?:\\d{3,4}(?![89])){2,})", ">$0<")); System.out.println("Inner possessive group: " + inputText.replaceFirst("(?>\\d{3,4}(?![89])){2,}", ">$0<")); System.out.println("Both: " + inputText.replaceFirst("(?>(?>\\d{3,4}(?![89])){2,})", ">$0<")); System.out.println(); inputText = "aab"; System.out.println("Input string: " + inputText); System.out.println("Expected: " + inputText.replaceFirst(".{1,3}+b", ">$0<")); System.out.println("Outer possessive group: " + inputText.replaceFirst("(?>.{1,3})b", ">$0<")); System.out.println("Inner possessive group: " + inputText.replaceFirst("(?>.){1,3}b", ">$0<")); System.out.println("Both: " + inputText.replaceFirst("(?>(?>.){1,3})b", ">$0<")); } }