GSON / JSON: unusual special char problem (umlaut)

When trying to process a JSON response using GSON (exiting the flickr API in case of a request), I came across what I would call rather strange encoding of certain special characters:

Original JSON response

Here is the hexadecimal representation:

Hex View of Original JSON response

The “u” followed by the “double dots” is what the German “ü” should be, and this is where my confusion begins. It is as if someone took a char and ripped it in half, encoding each of the two parts. The following figure shows the hexadecimal encoding of what I expect if it is encoded correctly:

Expected Hex View

Even stranger, in cases where I expected problems to occur (namely, a set of Asian characters), everything seems to work fine, for example. "title": "ナ ガ レ テ ユ ク · · ·"

Questions:

  • Is this some kind of flickrAPI weirdness or the correct JSON encoding for the repeated response? Or it is pretty correctly encoded by JSON and GSON, which does not allow "reassemble" this answer in the original "ü". Or did the author of the title message simply screw it on his part?
  • How to solve the problem (if it is JSON or GSON that were messing around, they obviously can not do anything if it was the author). How to find out which “other” characters are affected (ö and ä come to mind, but there are probably more “special cases”).
+5
1

, Unicode decocation:

, , :

- , java.text.Normalizer ( Java 1.6):

String decomposed = "Mitgef\u0308hl";
printChars(decomposed); // Mitgefühl -- [M, i, t, g, e, f, u, ̈, h, l]
String precomposed = Normalizer.normalize(decomposed, Form.NFC);
printChars(precomposed); // Mitgefühl -- [M, i, t, g, e, f, ü, h, l]

// Normalizing with NFC again doesn't hurt:
String precomposedAgain = Normalizer.normalize(precomposed, Form.NFC);
printChars(precomposedAgain); // Mitgefühl -- [M, i, t, g, e, f, ü, h, l]
...

static void printChars(String s) {
  System.out.println(s + " -- " + Arrays.toString(s.toCharArray()));
}

, NFC .

, String , Unicode, , .

MacOS, , , Flickr .

+4

All Articles