Java UTF8 Encoding

Question

Java UTF8 Encoding

I have a script in which some special characters are stored in the database (sybase) in the system default encoding, and I have to get this data and send it to third-party in UTF-8 encoding using the Java program.

There is a premise that data sent to a third party must not exceed a certain maximum size. Since after converting to UTF-8 a character can be replaced with 2 or 3 characters, then my logic tells me that after receiving data from the database, I should encode it into a UTF-8 string, and then split the string. Below are my observations:

When a special character such as a Chinese or Greek character or any special character> ASCII 256 is encountered, and when I convert it to UTF-8, one character can be represented more than 1 byte.

So, how can I be sure that the conversion is correct? For conversion, I use the following

// storing the data from database into string
string s = getdata from the database;

// converting all the data in byte array utf8 encoding
byte [] b = s.getBytes("UTF-8");

// creating a new string as my split logic is based on the string format

String newString = new String(b,"UTF-8");

But when I output this new line to the console, I get ?for special characters.

Therefore, I have some doubts:

If my conversion logic is wrong, then how can I fix it.
After my conversion to UTF-8, I can check if my conversion works in order or not? I mean, this is the correct message to be sent to a third-party, I assume that if the message is not read by the user after the conversion, there are some problems with the conversion.

.

, , - .

+5

java utf-8

one_pacifist 17 . '11 19:54

5

Adrian Pronk · Answer 1 · 2011-08-21T09:07:53+0000

, Unicode , Unicode.

? , .

, System.out.println(myUnicodeString) Unicode , System.out , , . Windows, windows-1252.

Java UTF-8 , , UTF-8:

PrintWriter pw = new PrintWriter(new FileOutputStream("filename.txt"), "UTF-8");
pw.println(myUnicodeString);

ddyer · Answer 2 · 2011-01-17T21:10:50+0000

Java unicode, java unicode, AWT swing. , , .

one_pacifist · Answer 3 · 2011-01-18T07:30:24+0000

..

, , ? . : -

a) - : frst u , u .

b) , , , MS, sybase db, txt , ? , , db MS word , . . , , , , . , , ( , , - , , utf8, ).

?

RobAu · Answer 4 · 2013-02-25T20:28:55+0000

, hex-editor, , UTF8. , , , .

, : http://www.joelonsoftware.com/articles/Unicode.html

DarioBB · Answer 5 · 2015-04-06T13:15:25+0000

- iso-8859-1 utf-8:

public String to_utf8(String fieldvalue) throws UnsupportedEncodingException{

        String fieldvalue_utf8 = new String(fieldvalue.getBytes("ISO-8859-1"), "UTF-8");
        return fieldvalue_utf8;
}

Java UTF8 Encoding

More articles: