Can this be done in one regex?

I need a regular expression to match a string that:

  • only has digits 0-9 and spaces
  • all numbers must be the same
  • must have at least 2 digits
  • must begin and end with numbers

Matches:

  eleven
 11111
 1 1 1 1 1
 eleven
 11 1 1 1 1 1
 eleven
 1 1 1

No matches:

  1 has only one digit
 11111 has space at the end
  11111 has space at beginning
 12 digits are different
 11: has other character

I know the regex for each of my requirements. This way I will use 4 regular expression tests. Can we do this in one regular expression?

+6
ruby regex
source share
4 answers

Yes, this can be done in one regular expression:

 ^(\d)(?:\1| )*\1$ 

Ruble link

Explanation:

 ^ - Start anchor ( - Start parenthesis for capturing \d - A digit ) - End parenthesis for capturing (?: - Start parenthesis for grouping only \1 - Back reference referring to the digit capture before | - Or - A literal space ) - End grouping parenthesis * - zero or more of previous match \1 - The digit captured before $ - End anchor 
+14
source share

Consider this program:

 #!/usr/bin/perl -l $_ = "3 33 3 3"; print /^(\d)[\1 ]*\1$/ ? 1 : 0; print /^(\d)(?:\1| )*\1$/ ? 1 : 0; 

It outputs a conclusion

 0 1 

The answer is obvious when you look at compiled regular expressions:

 perl -c -Mre=debug /tmp/a Compiling REx "^(\d)[\1 ]*\1$" synthetic stclass "ANYOF[0-9][{unicode_all}]". Final program: 1: BOL (2) 2: OPEN1 (4) 4: DIGIT (5) 5: CLOSE1 (7) 7: STAR (19) 8: ANYOF[\1 ][] (0) 19: REF1 (21) 21: EOL (22) 22: END (0) floating ""$ at 1..2147483647 (checking floating) stclass ANYOF[0-9][{unicode_all}] anchored(BOL) minlen 1 Compiling REx "^(\d)(?:\1| )*\1$" synthetic stclass "ANYOF[0-9][{unicode_all}]". Final program: 1: BOL (2) 2: OPEN1 (4) 4: DIGIT (5) 5: CLOSE1 (7) 7: CURLYX[1] {0,32767} (17) 9: BRANCH (12) 10: REF1 (16) 12: BRANCH (FAIL) 13: EXACT < > (16) 15: TAIL (16) 16: WHILEM[1/1] (0) 17: NOTHING (18) 18: REF1 (20) 20: EOL (21) 21: END (0) floating ""$ at 1..2147483647 (checking floating) stclass ANYOF[0-9][{unicode_all}] anchored(BOL) minlen 1 /tmp/a syntax OK Freeing REx: "^(\d)[\1 ]*\1$" Freeing REx: "^(\d)(?:\1| )*\1$" 

Backrefs are just regular octal characters inside character classes !!

+2
source share
 ^(\d)( *\1)+$ 


+1
source share
 /^(\d)(\1| )*\1$/ 
0
source share

All Articles