Java console cannot be read in Chinese correctly

Question

Java console cannot be read in Chinese correctly

I'm struggling to get Eclipse to read Chinese characters correctly, and I'm not sure where I am going wrong.

In particular, somewhere between reading in a line of Chinese (simplified or traditional) from the console and outputting it, it becomes distorted. Even when producing a large string of mixed text (English / Chinese characters), it seems to only change the look of the Chinese characters.

I shortened it to the next test case and directly annotated it with what I think happens at each stage - note that I am a student and would very much like to confirm my understanding (or otherwise) :)

public static void main(String[] args) { try { boolean isRunning = true; //Raw flow of input data from the console InputStream inputStream = System.in; //Allows you to read the stream, using either the default character encoding, else the specified encoding; InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8"); //Adds functionality for converting the stream being read in, into Strings(?) BufferedReader input_BufferedReader = new BufferedReader(inputStreamReader); //Raw flow of outputdata to the console OutputStream outputStream = System.out; //Write a stream, from a given bit of text OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8"); //Adds functionality to the base ability to write to a stream BufferedWriter output_BufferedWriter = new BufferedWriter(outputStreamWriter); while(isRunning) { System.out.println();//force extra newline System.out.print("> "); //To read in a line of text (as a String): String userInput_asString = input_BufferedReader.readLine(); //To output a line of text: String outputToUser_fromString_englishFromCode = "foo"; //outputs correctly output_BufferedWriter.write(outputToUser_fromString_englishFromCode); output_BufferedWriter.flush(); System.out.println();//force extra newline String outputToUser_fromString_ChineseFromCode = "之謂甚"; //outputs correctly output_BufferedWriter.write(outputToUser_fromString_ChineseFromCode); output_BufferedWriter.flush(); System.out.println();//force extra newline String outputToUser_fromString_userSupplied = userInput_asString; //outputs correctly when given English text, garbled when given Chinese text output_BufferedWriter.write(outputToUser_fromString_userSupplied); output_BufferedWriter.flush(); System.out.println();//force extra newline } } catch (Exception e) { // TODO: handle exception } }

Output Example:

 > 之謂甚foo之謂甚ä¹‹è¬‚ç"š > oaea foo之謂甚oaea > mixed input - English: fubar; Chinese: 之謂甚; foo之謂甚mixed input - English: fubar; Chinese: ä¹‹è¬‚ç"š; >

What is visible in this column corresponds to what I see in the Eclipse console and what is visible in the Eclipse debugger (when viewing / editing variable values). Changing the values of variables manually using the Eclipse debugger causes the code to depend on what value will behave, as I usually expected, and suggested that this is how the text IN is read, which is the problem.

I tried many different combinations of scanners / buffered stream [reader | writer], etc., to read and output with and without an explicit character type, although this was not done particularly systematically and could easily skip something.

I tried to configure the Eclipse environment to use UTF-8 wherever possible, but I guess I could have missed a place or two. Please note that the console will correctly output hardcoded Chinese characters.

Any help / guidance on this is greatly appreciated :)

+8

java eclipse character-encoding

kwah Dec 14 '12 at 16:13

source share

3 answers

Try the following: In eclipse, right-click your main class and click Run As> Run Configurations. Then go to the general tab and change the encoding to UTF-8. That should work!

+1

user1178729 Dec 14 '12 at 16:20

source share

This seems to be an encoding issue. There may be two problems: 1. You have not activated the ability of compilers to read anything other than ASCII characters, in your case you should be able to read UTF-8 characters. 2. Perhaps you have removed certain language packs? This is unlikely, since you can probably write Chinese characters?

You should search around and find out how you can configure the IDE to correctly assemble non-ASCII characters. In python, this is done in the code itself, I'm not sure how it is done in Java.

0

Arash saidi Jan 29 '13 at 11:36

source share

Zenil · Accepted Answer · 2013-01-23T19:05:01+0000

The console seems to be reading input incorrectly. Here is a link that I believe describes your problem and work rounds.

http://paranoid-engineering.blogspot.com/2008/05/getting-unicode-output-in-eclipse.html

A simple answer: Try setting the environment variable -Dfile.encoding = UTF-8 in eclipse.ini. (Before you enable this for the whole eclipse, you can just try installing it in the debug configuration for this program and see if it works)

The link has more suggestions.

Java console cannot be read in Chinese correctly

More articles: