How platform default character encoding affects platform performance

I read that its a bad idea to use the default character encoding for the platform, for example, when reading a text file and importing text into arrays, etc. Could you explain how this can affect cross-platform performance and how to overcome this problem? Is there an encoding that should be used for cross-platform applications? Thanks

+7
source share
3 answers

This is not about performance, but about showing and reading correctly encoded text. There are several ways to solve the problem:

  • set option JVM -Dfile.encoding=utf-8
  • always use methods that are overloaded with the character encoding parameter. These are those String , Reader , Writer , etc.

I think the latter is a must. If you always set the jvm parameter, it will work, but if you forget to set it at some point, there will be unexpected failures in random places.

And another question is stick to UTF-8.

See also this question .

+7
source

This is usually not a problem if the read and written files are not exchanged between platforms. But if you have, for example, a configuration file created on windows (Win1252, similar to ISO8859-1 encoding), and then run the application in the latest Linux (UTF-8 encoding), the configuration file will have problems with almost all characters above 127 ( e.g. German Umlauts Γ€, ΓΆ, ΓΌ or the € sign or similar characters).

In this case, just indicate that you always use either encoding, and stick to it. If you use only simple ASCII files (not Latin extended!), You will have no problems so far.

+2
source

The default encoding varies from OS to OS and even between users on the same computer in the case of some multilingual installations. This means that the personal data written by the application will change and not be read / appear corrupted when reading using a different default encoding. The Euro symbol (€) will be encoded as 80 bytes under windows-1252, A4 in accordance with ISO-8859-15 and E2 82 AC under UTF-8.

Outdated encodings can cause data loss , as many of them support only a narrow range of code points.

only supported to change the default encoding is to change it in the operating system.

It is usually better to choose encodings and prefer lossless Unicode encoding (usually UTF-8.) The decision to make "ANSI" encodings by default in Windows, for example, made more sense when supporting Windows 95.

+2
source

All Articles