Consider this program:
#!/usr/bin/perl -l $_ = "3 33 3 3"; print /^(\d)[\1 ]*\1$/ ? 1 : 0; print /^(\d)(?:\1| )*\1$/ ? 1 : 0;
It outputs a conclusion
0 1
The answer is obvious when you look at compiled regular expressions:
perl -c -Mre=debug /tmp/a Compiling REx "^(\d)[\1 ]*\1$" synthetic stclass "ANYOF[0-9][{unicode_all}]". Final program: 1: BOL (2) 2: OPEN1 (4) 4: DIGIT (5) 5: CLOSE1 (7) 7: STAR (19) 8: ANYOF[\1 ][] (0) 19: REF1 (21) 21: EOL (22) 22: END (0) floating ""$ at 1..2147483647 (checking floating) stclass ANYOF[0-9][{unicode_all}] anchored(BOL) minlen 1 Compiling REx "^(\d)(?:\1| )*\1$" synthetic stclass "ANYOF[0-9][{unicode_all}]". Final program: 1: BOL (2) 2: OPEN1 (4) 4: DIGIT (5) 5: CLOSE1 (7) 7: CURLYX[1] {0,32767} (17) 9: BRANCH (12) 10: REF1 (16) 12: BRANCH (FAIL) 13: EXACT < > (16) 15: TAIL (16) 16: WHILEM[1/1] (0) 17: NOTHING (18) 18: REF1 (20) 20: EOL (21) 21: END (0) floating ""$ at 1..2147483647 (checking floating) stclass ANYOF[0-9][{unicode_all}] anchored(BOL) minlen 1 /tmp/a syntax OK Freeing REx: "^(\d)[\1 ]*\1$" Freeing REx: "^(\d)(?:\1| )*\1$"
Backrefs are just regular octal characters inside character classes !!
tchrist
source share