Java byte for Linux string encoding problem

I am implementing a piece of software that works as follows:

I have a Linux server running a vt100 terminal application that displays text. My program telnets the server and reads / analyzes bits of text into the corresponding data. The corresponding data is sent to a small client, launched by a web server, which displays the data on an HTML page.

My problem is that some special characters, such as "åäö", are displayed as question marks (classic).

Background:
My program reads a stream of bytes using Apache Commons TelnetClient . The byte stream is converted to String, then the corresponding bits of the substring are placed back to the delimiter characters. After that, a new line is converted back to an array of bytes and sent using Socket for the client launched by the web server. This client creates a string from the received bytes and outputs it to standard output, which the web server reads and outputs HTML.

Step 1: byte [] → String → byte [] → [send to client]

Step 2: byte [] → String → [print]

Problem:
When I run my Java program on Windows, all characters, including "åäö", are displayed correctly on the resulting HTML page. However, if I run the program on Linux , all special characters are converted to " ? " (Question mark).

The web server and client are currently running on Windows (step 2).

Code:
The program basically works as follows:

My program:

byte[] data = telnetClient.readData() // Assume method works and returns a byte[] array of text.

// I have my reasons to append the characters one at a time using a StringBuffer.
StringBuffer buf = new StringBuffer();
for (byte b : data) {
    buf.append((char) (b & 0xFF));
}

String text = buf.toString();

// ...
// Relevant bits are substring'ed and put back into the String.
// ...

ServerSocket serverSocket = new ServerSocket(...);
Socket socket = serverSocket.accept();
serverSocket.close();

socket.getOutputStream.write(text.getBytes());
socket.getOutputStream.flush();

Client executed by web server:

Socket socket = new Socket(...);

byte[] data = readData(socket); // Assume this reads the bytes correctly.

String output = new String(data);

System.out.println(output);

Suppose the synchronization between reads and writes works.

:
- . . Windows "WINDOWS 1252", -, -, Linux . "Charset.defaultCharset(). ForName()", , Linux "US-ASCII". , Linux "UTF-8"?

, Linux?

+1
3

, , .

new String() String.getBytes() , . , , UTF-8 (hardcoded).

FileInputStream, FileOutputStream, InputStreamReader OutputStreamWriter, ptentially ( , , ).

+8

String(byte[] bytes, String encoding) - . Java. (: UTF-16, )

getBytes(String encoding) .

+3

, , telnetClient.readData()? , windows-1252. , . String windows-1252:

text.getBytes("windows-1252");

String output = new String(data, "windows-1252");

java.nio.charset.Charset telnet , UTF-8 : UTF-8 ISO-8859-1 Java - - - String .

0

All Articles