I am trying to find a problem. Here is what I already have:
There is Exists.java :
import java.io.*; public class Exists { public static void main(String[] args) { new File("aaa").exists(); new File("aaa\u00E4").exists(); new File("aaa\u00C3\u00A4").exists(); } }
And there is java -version :
java version "1.6.0_20" Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
Now to the interesting part:
$ strace -f -o strace.out java Exists && grep 'stat("aaa' strace.out 31942 stat("aaa", 0x41464950) = -1 ENOENT (No such file or directory) 31942 stat("aaa\303\244", 0x41464950) = -1 ENOENT (No such file or directory) 31942 stat("aaa\303\203\302\244", 0x41464950) = -1 ENOENT (No such file or directory)
It's good that strace works at the byte level, and not at the character level, such as Java. So everything is all right. I have a LANG environment variable set to en_US.UTF-8 , all LC_* variables LC_* not set.
Now we track the problem to the minimum working example:
$ strace -f -o strace.out env - LC_ALL=en_US.UTF-8 /home/roland/bin/java Exists && grep 'stat("aaa' strace.out 31968 stat("aaa", 0x41a75950) = -1 ENOENT (No such file or directory) 31968 stat("aaa\303\244", 0x41a75950) = -1 ENOENT (No such file or directory) 31968 stat("aaa\303\203\302\244", 0x41a75950) = -1 ENOENT (No such file or directory)
It still works. So try another encoding:
$ strace -f -o strace.out env - LANG=en_US.ISO-8859-1 /home/roland/bin/java Exists && grep 'stat("aaa' strace.out 32070 stat("aaa", 0x407a3950) = -1 ENOENT (No such file or directory) 32070 stat("aaa?", 0x407a3950) = -1 ENOENT (No such file or directory) 32070 stat("aaa??", 0x407a3950) = -1 ENOENT (No such file or directory)
So this will not work. One possible reason may be that I chose a locale that is not on the list printed by locale -a . But this should not be the reason that Java converts letters to question marks.
Once LANG points to a non-existent language, setting the sun.jnu.encoding property no longer has any effect. So now I have no ideas.