I'm struggling to get Eclipse to read Chinese characters correctly, and I'm not sure where I am going wrong.
In particular, somewhere between reading in a line of Chinese (simplified or traditional) from the console and outputting it, it becomes distorted. Even when producing a large string of mixed text (English / Chinese characters), it seems to only change the look of the Chinese characters.
I shortened it to the next test case and directly annotated it with what I think happens at each stage - note that I am a student and would very much like to confirm my understanding (or otherwise) :)
public static void main(String[] args) { try { boolean isRunning = true; //Raw flow of input data from the console InputStream inputStream = System.in; //Allows you to read the stream, using either the default character encoding, else the specified encoding; InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8"); //Adds functionality for converting the stream being read in, into Strings(?) BufferedReader input_BufferedReader = new BufferedReader(inputStreamReader); //Raw flow of outputdata to the console OutputStream outputStream = System.out; //Write a stream, from a given bit of text OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8"); //Adds functionality to the base ability to write to a stream BufferedWriter output_BufferedWriter = new BufferedWriter(outputStreamWriter); while(isRunning) { System.out.println();//force extra newline System.out.print("> "); //To read in a line of text (as a String): String userInput_asString = input_BufferedReader.readLine(); //To output a line of text: String outputToUser_fromString_englishFromCode = "foo"; //outputs correctly output_BufferedWriter.write(outputToUser_fromString_englishFromCode); output_BufferedWriter.flush(); System.out.println();//force extra newline String outputToUser_fromString_ChineseFromCode = "之謂甚"; //outputs correctly output_BufferedWriter.write(outputToUser_fromString_ChineseFromCode); output_BufferedWriter.flush(); System.out.println();//force extra newline String outputToUser_fromString_userSupplied = userInput_asString; //outputs correctly when given English text, garbled when given Chinese text output_BufferedWriter.write(outputToUser_fromString_userSupplied); output_BufferedWriter.flush(); System.out.println();//force extra newline } } catch (Exception e) { // TODO: handle exception } }
Output Example:
> 之謂甚foo之謂甚之謂ç"š > oaea foo之謂甚oaea > mixed input - English: fubar; Chinese: 之謂甚; foo之謂甚mixed input - English: fubar; Chinese: 之謂ç"š; >
What is visible in this column corresponds to what I see in the Eclipse console and what is visible in the Eclipse debugger (when viewing / editing variable values). Changing the values of variables manually using the Eclipse debugger causes the code to depend on what value will behave, as I usually expected, and suggested that this is how the text IN is read, which is the problem.
I tried many different combinations of scanners / buffered stream [reader | writer], etc., to read and output with and without an explicit character type, although this was not done particularly systematically and could easily skip something.
I tried to configure the Eclipse environment to use UTF-8 wherever possible, but I guess I could have missed a place or two. Please note that the console will correctly output hardcoded Chinese characters.
Any help / guidance on this is greatly appreciated :)
java eclipse character-encoding
kwah
source share