Why is the following regular expression [^ 0-9! A-zA-z # \\ $% & '\\ * \\ + \\ - / = \\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ ~ @ \\.] + for String. split does not work?

I have this regex:

[^0-9!a-zA-z#\\$%&'\\*\\+\\-/=\\?\\^_`\\{\\|\\} ~@ \\.]+ 

and I'm trying to split the email address using

 [Email] info@emerycommunications.com 

But the following code in java:

 String fileStr = "[Email] info@emerycommunications.com "; String invalidCharacters = "[^0-9!a-zA-z#\\$%&'\\*\\+\\-/=\\?\\^_`\\{\\|\\} ~@ \\.]+"; String[] tokens = fileStr.split(invalidCharacters); for (String token:tokens) { if (token.contains("@")) { System.out.println(token); } } 

gives this result:

 [Email] info@emerycommunications.com 

I have absolutely no idea how the invalidCharacters variable spans [ and ] .

+4
source share
2 answers

You have Az in your character class, and the square bracket characters are between upper case Z and lower case a in ASCII (and Unicode) order. Thus, ] is considered a valid, not an invalid character - presumably you mean Az .

+7
source

This is a regex:

 [^0-9!a-zA-z#\$%&'\*\+\-/=\?\^_`\{\|\} ~@ \.]+ 

Matches at least one, but with any character, except for characters enclosed between square brackets. Square brackets per se are not part of the character set. And most of these backslashes are not needed; none of the inverse characters other than a hyphen are special in the character class.

However, since you have a range of Az that has uppercase A through lowercase z , you not only have lowercase letters there twice, but also have all the characters that are between z and A , namely [ , \ , ] , ^ , _ and ` . So, how brackets fall into a negative character class.

If this is not what you intend, this regular expression may be what you are looking for:

 [^0-9!a-zA-Z#$%&'*+=?^_`{|} ~@.- ]+ 

(Moving a hyphen to the end means that it does not need back support)

+4
source

All Articles