Java UTF8 Encoding

I have a script in which some special characters are stored in the database (sybase) in the system default encoding, and I have to get this data and send it to third-party in UTF-8 encoding using the Java program.

There is a premise that data sent to a third party must not exceed a certain maximum size. Since after converting to UTF-8 a character can be replaced with 2 or 3 characters, then my logic tells me that after receiving data from the database, I should encode it into a UTF-8 string, and then split the string. Below are my observations:

When a special character such as a Chinese or Greek character or any special character> ASCII 256 is encountered, and when I convert it to UTF-8, one character can be represented more than 1 byte.

So, how can I be sure that the conversion is correct? For conversion, I use the following

// storing the data from database into string
string s = getdata from the database;

// converting all the data in byte array utf8 encoding
byte [] b = s.getBytes("UTF-8");

// creating a new string as my split logic is based on the string format

String newString = new String(b,"UTF-8");

But when I output this new line to the console, I get ?for special characters.

Therefore, I have some doubts:

  • If my conversion logic is wrong, then how can I fix it.
  • After my conversion to UTF-8, I can check if my conversion works in order or not? I mean, this is the correct message to be sent to a third-party, I assume that if the message is not read by the user after the conversion, there are some problems with the conversion.

.

, , - .

+5
5

, Unicode , Unicode.

? , .

, System.out.println(myUnicodeString) Unicode , System.out , , . Windows, windows-1252.

Java UTF-8 , , UTF-8:

PrintWriter pw = new PrintWriter(new FileOutputStream("filename.txt"), "UTF-8");
pw.println(myUnicodeString);
+2

Java unicode, java unicode, AWT swing. , , .

0

..

, , ? . : -

a) - : frst u , u .

b) , , , MS, sybase db, txt , ? , , db MS word , . . , , , , . , , ( , , - , , utf8, ).

?

0
0

- iso-8859-1 utf-8:

public String to_utf8(String fieldvalue) throws UnsupportedEncodingException{

        String fieldvalue_utf8 = new String(fieldvalue.getBytes("ISO-8859-1"), "UTF-8");
        return fieldvalue_utf8;
}
0

All Articles