I have a script in which some special characters are stored in the database (sybase) in the system default encoding, and I have to get this data and send it to third-party in UTF-8 encoding using the Java program.
There is a premise that data sent to a third party must not exceed a certain maximum size. Since after converting to UTF-8 a character can be replaced with 2 or 3 characters, then my logic tells me that after receiving data from the database, I should encode it into a UTF-8 string, and then split the string. Below are my observations:
When a special character such as a Chinese or Greek character or any special character> ASCII 256 is encountered, and when I convert it to UTF-8, one character can be represented more than 1 byte.
So, how can I be sure that the conversion is correct? For conversion, I use the following
string s = getdata from the database;
byte [] b = s.getBytes("UTF-8");
String newString = new String(b,"UTF-8");
But when I output this new line to the console, I get ?for special characters.
Therefore, I have some doubts:
- If my conversion logic is wrong, then how can I fix it.
- After my conversion to UTF-8, I can check if my conversion works in order or not? I mean, this is the correct message to be sent to a third-party, I assume that if the message is not read by the user after the conversion, there are some problems with the conversion.
.
, , - .