What stribizhev's answer indicated and recorded inefficiency in regular expression. There is no catastrophic retreat. The change only slightly delays the StackOverflowError without its permission (see Appendix ).
In the original regular expression, if the first branch (\d|\d\d)-(\d|\d\d) fails, the second branch will again do the additional work corresponding to (\d|\d\d) , which is prefix of the first branch.
( ( (\d|\d\d)-(\d|\d\d) ) | (\d|\d\d) )
When re-writing (as shown in his answer) the prefix (\d|\d\d) will be matched only once, and the engine should check only two different sequels (match -(\d|\d\d) or just an empty string )
(\d|\d\d)(?:-(\d|\d\d))?
Here's how his answer improves the effectiveness of regular expressions. The same method applies to \d|\d\d .
Back to the StackOverflowError problem. If you run a regular expression in a string with 1000 elements, any of the above expressions will raise a StackOverflowError . This is due to the implementation of the Pattern class in Sun / Oracle / OpenJDK, which uses recursion for the greedy and lazy quantifier.
Since the regular expression is not ambiguous , you can fix this problem by doing a quantifier on an external large group. The regular expression is copied from the stribizhev response with some changes:
"(?:[AZ][AZ]\\d\\d[AZ]\\[(?:\\*|\\d{1,2}(?:-\\d{1,2})?)\\](?:,|$))++" ^^
Because the implementation uses a loop to implement a quantifier (since there is no need to return), a StackOverflowError cannot occur, no matter how long the input line takes. Using a stack is just one repetition, unlike the case in the question where it grows linearly with the number of elements in a row.
application
Testing program
The following is a test program showing the number of elements that a regular expression can handle. On my system (Oracle JRE, version 1.8.0_25), the regular expression in the question manages to reach 104 * 4 = 416 elements before the failure, stribizhev the answer manages to reach 137 * 4 = 548, the stribizhev answer is changed to remove unnecessary group commands up to 150 * 4 = 600, and the version with possessive quantifier passes all tests (500 * 4 = 2000 elements).
public class SO29758814 { public static void main(String[] args) { String s = ""; for (int i = 1; i <= 500; i++) { s += "RA01D[1-1],RA01D[17-17],RA01D[2-2],RA01D[18-18],"; System.out.print(i); try { // Question System.out.print(" 1 " + s.matches("([AZ][AZ]\\d\\d[AZ]\\[(\\*|(((\\d|\\d\\d)-(\\d|\\d\\d))|(\\d|\\d\\d)))\\](,|$))+")); } catch (Throwable e) { } try { // stribizhev answer System.out.print(" 2 " + s.matches("([AZ]{2}\\d{2}[AZ]\\[(\\*|((\\d{1,2})(?:-(\\d{1,2}))?))\\](?:,|$))+")); } catch (Throwable e) { } try { // stribizhev answer, remove unnecessary groups System.out.print(" 3 " + s.matches("(?:[AZ][AZ]\\d\\d[AZ]\\[(?:\\*|\\d{1,2}(?:-\\d{1,2})?)\\](?:,|$))+")); } catch (Throwable e) { } try { // stribizhev answer, remove unnecessary groups, use possessive quantifier System.out.print(" 4 " + s.matches("(?:[AZ][AZ]\\d\\d[AZ]\\[(?:\\*|\\d{1,2}(?:-\\d{1,2})?)\\](?:,|$))++")); } catch (Throwable e) { } System.out.println(); } } }