Switching to Java on aeouu

Possible duplicates:
Remove diacritics (ń ǹ ň ṅ ņ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ȵ) from Unicode characters
Is there a way to get rid of accents and convert a whole string to regular letters?

How can i do this? thanks for the help

+26
java string
Nov 08 '10 at 8:10
source share
3 answers

I think your question is the same:

  • Java - getting rid of accents and converting them to regular letters
  • Convert Java string to ascii

and therefore the answer is also the same:

String convertedString = Normalizer .normalize(input, Normalizer.Form.NFD) .replaceAll("[^\\p{ASCII}]", ""); 

Cm

Code example:

 final String input = "Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ"; System.out.println( Normalizer .normalize(input, Normalizer.Form.NFD) .replaceAll("[^\\p{ASCII}]", "") ); 

Output:

This is a funky line

+84
Nov 08 '10 at 8:17
source share

You can use java.text.Normalizer to separate the base letters and diacritics, and then remove the latter with a regular expression:

 public static String stripDiacriticas(String s) { return Normalizer.normalize(s, Form.NFD) .replaceAll("\\p{InCombiningDiacriticalMarks}+", ""); } 
+9
Nov 08 '10 at 8:15
source share

First off, you shouldn't. These symbols carry special phonetic properties that cannot be ignored.

The way to convert them is to create a Map containing each pair:

 Map<Character, Character> map = new HashMap<Character, Character>(); map.put('á', 'a'); map.put('é', 'e'); //etc.. 

and then loop the characters in the line by creating a new line by calling map.get(currentChar)

+6
Nov 08 '10 at 8:12
source share



All Articles