Java replace German umlauts

Question

Java replace German umlauts

I have the following problem. I am trying to replace German umlauts like ä , ö , ü in java. But that just doesn't work. Here is my code:

private static String[][] UMLAUT_REPLACEMENTS = { { "Ä", "Ae" }, { "Ü", "Ue" }, { "Ö", "Oe" }, { "ä", "ae" }, { "ü", "ue" }, { "ö", "oe" }, { "ß", "ss" } }; public static String replaceUmlaute(String orig) { String result = orig; for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) { result = result.replaceAll(UMLAUT_REPLACEMENTS[i][0], UMLAUT_REPLACEMENTS[i][1]); } return result; }

An ä remains ä and so on. I do not know if this problem is related to the encoding, but the string contains the exact character that I am trying to replace.

Thank you in advance

+8

java

user2841991 21 sept '15 at 13:16

source share

7 answers

Unicode has a small problem first:

ä can be one code point SMALL_LETTER_A_WITH_UMLAUT or two code points: SMALL_LETTER_A, followed by COMBINING_DIACRITICAL_MARK_UMLAUT.

To do this, you can normalize the Unicode text.

 s = Normalizer.normalize(s, Normalizer.Form.NFKC);

C stands for compilation and will give a compact version.

The second, more prosaic, problem is that the encoding of the java source in the editor should be the same as for the javac -encoding ... compiler javac -encoding ...

You can verify the correct encoding using (by criterion) u-shielding:

 "\u00E4" // instead of ä

I guess this could be a problem. The international norm seems to have started using UTF-8 for java sources and compilation.

Alternatively, you can use

  result = result.replace(UMLAUT_REPLACEMENTS[i][0], UMLAUT_REPLACEMENTS[i][1]);

without replacing regex, faster.

+11

Joop eggen 21 sept '15 at 13:26

source share

Your code looks good, replaceAll() should work as expected.

Try this if you also want to keep capital letters (for example, ÜBUNG will become UEBUNG , not UEBUNG ):

 private static String replaceUmlaut(String input) { //replace all lower Umlauts String output = input.replace("ü", "ue") .replace("ö", "oe") .replace("ä", "ae") .replace("ß", "ss"); //first replace all capital umlaute in a non-capitalized context (eg Übung) output = output.replace("Ü(?=[a-zäöüß ])", "Ue") .replace("Ö(?=[a-zäöüß ])", "Oe") .replace("Ä(?=[a-zäöüß ])", "Ae"); //now replace all the other capital umlaute output = output.replace("Ü", "UE") .replace("Ö", "OE") .replace("Ä", "AE"); return output; }

A source

+4

user1438038 21 sept '15 at 13:26

source share

ENCODING ENCODING ENCODING ....

Various input sources can cause String encoding complications. for example, there may be UTF-8 encoding, and the other - ISO

some people have suggested that the code works for them, so it is most likely that your lines have different encodings during processing. (different coding results in different byte arrays are thus not replaced ...)

To solve your problem from its root, you must make sure that each of your sources uses exactly the same encoding.

try this exercise and it will hopefully help you solve your problem:

1 try the following:

 System.out.println(Arrays.asList("Ä".getBytes()); //1 and 2 should have same results System.out.println(Arrays.asList(new String("Ä","UTF-8").getBytes()); //1 and 2 should have same results System.out.println(Arrays.asList(new String("Ä","UTF-32").getBytes()); //should have a different results from one and two System.out.println(Arrays.asList(orig.getBytes()); //look for representation and search for pattenr of numbers (this bit is the hard bit I guess). System.out.println(Arrays.asList(new String(orig,"UTF-32").getBytes()); //look for representation and search for pattenr of numbers (this bit is the hard bit I guess).

the next step is to see how the orgi string is orgi . for example, if you received from the Internet, make sure your POST and GET method uses your preferred encoding

EDIT 1:

try the following:

 { { new String("Ä".getBytes(),"UTF-8"), "Ae" }, ... };

if this does not work, try this:

  byte[] bytes = {-61,-124}; //byte representation of Ä in utf-8 String Ae = new String(bytes,"UTF-8"); { { Ae, "Ae" }, ... }; //and do for the rest

+2

nafas 21 sept '15 at 14:17

source share

I was just trying to run it and it works fine.

If you do not use regular expressions, I would use string.replace , not string.replaceAll , since it is slightly faster than the last. The difference between the two is basically that replaceAll can handle regular expressions.

EDIT: Just noticed that the people in the comments told me the same thing in front of me, so if you read theres, you can pretty much ignore what I said, as it is stated that the problem exists elsewhere in your code, as this The fragment works as expected.

+1

Vistari 21 sept '15 at 13:26

source share

It works fine when I try, so this should be an encoding issue.

Check the system encoding. You can add -encoding UTF-8 to your javac compiler command line.

  -encoding encoding Set the source file encoding name, such as EUC-JP and UTF-8. If -encoding is not specified, the platform default converter is used.

+1

Klas lindbäck 21 sept '15 at 13:28

source share

I had to change user1438038 answer:

 private static String replaceUmlaute(String output) { String newString = output.replace("\u00fc", "ue") .replace("\u00f6", "oe") .replace("\u00e4", "ae") .replace("\u00df", "ss") .replaceAll("\u00dc(?=[az\u00e4\u00f6\u00fc\u00df ])", "Ue") .replaceAll("\u00d6(?=[az\u00e4\u00f6\u00fc\u00df ])", "Oe") .replaceAll("\u00c4(?=[az\u00e4\u00f6\u00fc\u00df ])", "Ae") .replace("\u00dc", "UE") .replace("\u00d6", "OE") .replace("\u00c4", "AE"); return newString; }

This should work on any target platform (I had problems with tomcat on windows).

0

dermoritz Nov 23 '17 at 8:21

source share

user2841991 · Accepted Answer · 2015-09-22T06:06:16+0000

This finally helped me:

 private static String[][] UMLAUT_REPLACEMENTS = { { new String("Ä"), "Ae" }, { new String("Ü"), "Ue" }, { new String("Ö"), "Oe" }, { new String("ä"), "ae" }, { new String("ü"), "ue" }, { new String("ö"), "oe" }, { new String("ß"), "ss" } }; public static String replaceUmlaute(String orig) { String result = orig; for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) { result = result.replace(UMLAUT_REPLACEMENTS[i][0], UMLAUT_REPLACEMENTS[i][1]); } return result; }

So, thanks to all your answers and help. Finally, it was a mixture of nafs (with a new line) and Joop Eggen (correct substitution). You have received my contribution, thank you very much!

Java replace German umlauts

More articles: