How to ensure that a java program uses UTF-8 encoding

I recently discovered that using the default JVM encoding is causing errors. I have to explicitly use specific ex encoding. UTF-8 when working with String , InputStreams , etc. I have a huge code base for checking this. Could someone suggest me an easier way to check this than to search the entire codebase.

Thanks nine

+7
java utf-8
source share
4 answers

This is not a direct answer, but in order to facilitate the work, it’s good to know that in a slightly decent IDE you can just look for used occurrences InputStreamReader , OutputStreamWriter , String#getBytes() , String(byte[]) , Properties#load() , URLEncoder#encode() , URLDecoder#decode() and spouses, in which you can pass the encoding and then update accordingly. You also want to find FileReader and FileWriter and replace them with the first two classes mentioned. True, this is a tedious task, but worth it, and I would prefer it to be higher, relying on specific features.

In Eclipse, for example, select the project of interest, press Ctrl + H , go to the "Java Search" tab, enter, for example, InputStreamReader , check the "Search Designer" box, select "Sources" as the only Search In and perform a search.

+3
source share
 System.getProperty("file.encoding") 

returns VM encoding for input / output operations

You can install it by going -Dfile.encoding=utf-8

+4
source share

relying on standard JVM encoding causes errors

Indeed, you always need to specify the encoding when encoding / decoding.

If you are satisfied with the default global character set for all of you encoding / decoding (not always enough), you can live with Bozho answer :. Indicate a known default in your JVM arguments or in some static initializer

But it is good practice to find all the implicit encoding specifications in the code and replace them with an explicit encoding table: some typical methods / classes to look at: FileWriter FileReader InputStreamReader OutputStreamWriter , String#getBytes() , String(byte[]) .

0
source share

If the file is managed by its own tools on the servers, you may need to set the encoding to System.getProperty ("file.encoding"). I ran into errors in both directions.

Best practice is to know which character set is being used and set this up. Also, if the file is used to interact with another application, you must determine the character set used. This can be a Windows code page or another UTF format.

0
source share

All Articles