It depends on what you mean. If you just want to get rid of them, do the following:
(Update: apparently you also want to keep the numbers, use the second line in this case)
String alphaOnly = input.replaceAll("[^a-zA-Z]+",""); String alphaAndDigits = input.replaceAll("[^a-zA-Z0-9]+","");
or equivalent:
String alphaOnly = input.replaceAll("[^\\p{Alpha}]+",""); String alphaAndDigits = input.replaceAll("[^\\p{Alpha}\\p{Digit}]+","");
(All of this can be greatly improved by precompiling the regex pattern and storing it in a constant)
Or, Guava :
private static final CharMatcher ALNUM = CharMatcher.inRange('a', 'z').or(CharMatcher.inRange('A', 'Z')) .or(CharMatcher.inRange('0', '9')).precomputed();
But if you want to turn accented characters into something reasonable that is still ascii, look at these questions:
- Convert Java string to ASCII
- Java change in aeouu
- ล วน ล ล แน
ล แน แน แน ฬ ษฒ ฦ แถ ษณ ศต โ n or Remove diacritics from Unicode characters
Sean Patrick Floyd Nov 26 '10 at 7:44 2010-11-26 07:44
source share