Problem with file name encoding in java

Trying to open the file, he claims that it cannot be found due to the mismatch of the character set when the file names have accents. I work using UTF-8 on a Linux system (and / etc / locales UTF-8). Running jboss with -Dfile.encoding = UTF-8 and environment variable JBOSS_ENCODING = "UTF-8"

With JSP, I get the file name:

String fileName = element.getChildText("FileName"); out.println("File to be opened : " + filename); 

Displays:

File to open: aaaaaร .txt

But the new file (fileName) will not work. Just file.exists () is false.

Attempt:

 File[] files = dir.listFiles(); for (int i=0; i<files.length; i++){ out.println(fileName); 

I get: aaaaaรƒ.txt

Why is he reading and trying to open the file with the file on the hard drive as ISO-8859-1? Is this a jboss configuration? Java configuration? How to get java.io.File to read a file using UTF-8 as the encoding of the file name?

I used other tools and the name is always read normally using UTF-8.

(note that I'm always talking about the file name, not the contents, it may be an invalid file)

+4
source share
2 answers

I am trying to find a problem. Here is what I already have:

There is Exists.java :

 import java.io.*; public class Exists { public static void main(String[] args) { new File("aaa").exists(); new File("aaa\u00E4").exists(); new File("aaa\u00C3\u00A4").exists(); } } 

And there is java -version :

 java version "1.6.0_20" Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) 

Now to the interesting part:

 $ strace -f -o strace.out java Exists && grep 'stat("aaa' strace.out 31942 stat("aaa", 0x41464950) = -1 ENOENT (No such file or directory) 31942 stat("aaa\303\244", 0x41464950) = -1 ENOENT (No such file or directory) 31942 stat("aaa\303\203\302\244", 0x41464950) = -1 ENOENT (No such file or directory) 

It's good that strace works at the byte level, and not at the character level, such as Java. So everything is all right. I have a LANG environment variable set to en_US.UTF-8 , all LC_* variables LC_* not set.

Now we track the problem to the minimum working example:

 $ strace -f -o strace.out env - LC_ALL=en_US.UTF-8 /home/roland/bin/java Exists && grep 'stat("aaa' strace.out 31968 stat("aaa", 0x41a75950) = -1 ENOENT (No such file or directory) 31968 stat("aaa\303\244", 0x41a75950) = -1 ENOENT (No such file or directory) 31968 stat("aaa\303\203\302\244", 0x41a75950) = -1 ENOENT (No such file or directory) 

It still works. So try another encoding:

 $ strace -f -o strace.out env - LANG=en_US.ISO-8859-1 /home/roland/bin/java Exists && grep 'stat("aaa' strace.out 32070 stat("aaa", 0x407a3950) = -1 ENOENT (No such file or directory) 32070 stat("aaa?", 0x407a3950) = -1 ENOENT (No such file or directory) 32070 stat("aaa??", 0x407a3950) = -1 ENOENT (No such file or directory) 

So this will not work. One possible reason may be that I chose a locale that is not on the list printed by locale -a . But this should not be the reason that Java converts letters to question marks.

Once LANG points to a non-existent language, setting the sun.jnu.encoding property no longer has any effect. So now I have no ideas.

+3
source

All Articles