Is the regular expression [aZ] valid, and if so, is it the same as [a-zA-Z]?

Is the regular expression [aZ] valid, and if so, is it the same as [a-zA-Z]? Note that in [aZ] a is lowercase, and Z is uppercase.

Edit:

I received several answers indicating that while [aZ] is invalid, then [Az] is valid (but will not be the same as [a-zA-Z]), and that is really what I was looking for. Since I wanted to know at all, if possible, replace [a-zA-Z] with a more compact version.

Thanks to everyone who contributed to the response.

+13
regex
Nov 01 '09 at 23:59
source share
7 answers

No, a (97) is higher than Z (90). [aZ] not a valid character class. However, [aZ] also not be equivalent, but for a different reason. It will cover all letters, but will also contain characters between uppercase and lowercase letters: [\]^_` .

+33
Nov 02 '09 at 0:07
source share

I am not sure about the implementation of other languages, but in PHP you can do

 "/[az]/i" 

and it will be insensitive. Probably something similar for other languages.

+4
Nov 02 '09 at 0:05
source share

You will not specify which language, but in general [aZ] will not be a valid range, since in ASCII lowercase alpha characters appear after uppercase letters. [aZ] may be a valid range (indicating all upper and lower case alpha, as well as punctuation marks that appear between Z and a ), but this may not be the case, depending on the particular implementation. The i flag can be added to the regular expression to make it case insensitive; check your specific implementation for instructions on how to specify this flag.

+3
Nov 02 '09 at 0:09
source share

You can always try:

  print "ok" if "monkey" =~ /[aZ]/; 

Perl says

 Invalid [] range "aZ" in regex;  marked by <- HERE in m / [aZ <- HERE] / at az.pl line 4.
+2
Nov 02 '09 at 0:04
source share

If it is valid, it will not do what you expect.

The character code Z is lower than the character code a, therefore, if the codes are replaced with the middle range [Za] , it will be the same as [Z\[\\\]^_`a] , that is, it will contain the characters Z and a , and the characters in between.

If you use [Az] to get all the uppercase and lowercase characters, it’s still not the same as [A-Za-z] , it is the same as [AZ\[\\\]^_`az] .

+2
Nov 02 '09 at 0:12
source share

No, this is incorrect, probably because the ASCII values ​​are not sequential from z to A.

+1
Nov 02 '09 at 0:06
source share

I just fell for this in a script (not mine).

It seems that grep, awk, sed accept [aZ] based on your locale (i.e. LANG or LC_CTYPE environment variable). In POSIX, [aZ] not allowed by these tools, but in some other locales (for example, en_gb.utf8) it works and matches [a-zA-Z] .

Yes, I checked, it does not match any of _^[]` .

Given that it took quite a while to debug, I strongly discourage anyone from using [aZ] in a regular expression.

+1
Oct 17
source share



All Articles