String.getBytes ("ISO-8859-1") gives me 16-bit characters in OS X

Using Java 6 to get 8-bit characters from a string:

System.out.println(Arrays.toString("öä".getBytes("ISO-8859-1"))); 

gives me, on Linux: [-10, 28] but OS X I get: [63, 63, 63, -89]

I seem to get the same result when using the new nio CharSetEncoder class. What am I doing wrong? Or is this an Apple bug? :)

+6
java unicode ascii macos
source share
5 answers

I was able to reproduce this problem by saving the source file as UTF-8 and then telling the compiler that this is really a MacRoman:

javac -encoding MacRoman Test.java

I would think that javac would use UTF-8 on OSX by default, but maybe not. Or maybe you are using an IDE and this is not MacRoman compliant. In any case, you should use UTF-8 instead.

+4
source share

What is the encoding of the source file? 63 is the code for ? , which means that "the character cannot be converted to the specified encoding."

Therefore, I assume that you copied the source file on a Mac and that the source file uses an encoding that the Java Java compiler does not expect. IIRC, OS X expects the file to be UTF-8.

+2
source share

Your source file creates "öä" by combining characters.

Look at this:

 System.out.println(Arrays.toString("\u00F6\u00E4".getBytes("ISO-8859-1"))) 

This will print [-10, -28] as you expect (I don’t like to print it this way, but I don’t know that this is not a question of your question) because it shows the Unicode codes carved from the stone and your text editor it is not permitted to “play smart” by combining “o” and “a” with diacritical marks.

Usually, when you encounter such problems, you probably want to use two OS X Un * x commands to understand what happens under the hood: file and hexdump very convenient in such cases.

You want to run them in the source file, and you can run them in your class file.

+2
source share

Maybe the character set for the source is not installed (and therefore differs according to the system locale)?

Is it possible to run the same compiled class on both systems (not recompile)?

+1
source share

Keep in mind that there is more than one way to represent characters. Mac OS X uses unicode by default, so your string literal may not actually display in two bytes. You must make sure that you load the string from the appropriate set of incoming characters; for example, by specifying escape functions in the source character.

0
source share

All Articles