One curiosity, I wonder how the JVM knows the default source encoding ...
The mechanism used by the JVM to determine the default initial encoding is platform dependent. On UNIX / UNIX-like systems, it is defined by the LANG and LC_ * environment variables; see man locale .
Ermmm. Is this command used to check what is the default encoding in a particular OS?
It is right. But I was telling you this because a manual record describes how the default encoding is determined by environment variables.
In retrospect, this may not be what you mean by your original comment, but it is as the default encoding for the platform is indicated. (And the concept of "default character set" for an individual file does not make sense, see below.)
What if I say that I have 10 Java source files, half of them are saved as UTF-8 and the rest are saved as UTF-16, after compilation I move them (class file) to another OS platform, now how does the JVM know is their default encoding? Will the default encoding information be included in the Java class file?
This is a rather confusing set of questions:
The text file does not have a default character set. It has a character set / encoding.
A non-text file has no character encoding at all. The concept is meaningless.
There is no 100% reliable way to determine what the character encoding of a text file is.
If you do not tell the java compiler what the file encoding is, it will assume that it is the default encoding for the platform. The compiler is not trying to guess. If you make a mistake in the encoding, the compiler may or may not even notice your error.
Bytecode files (.class) are binary files (see 2).
When character and string literals are compiled into a ".class" file, they are presented NOW in a way that is not affected by the default encoding of the platform or anything else you can influence.
If you made a mistake with the encoding of the source file during compilation, you cannot fix it at the level of the .class file. The only option is to go back and recompile the classes, telling the Java compiler the correct encoding of the source file.
"What if I say that I have 10 Java source files, half of them are saved as UTF-8, and the rest are saved as UTF-16."
Just don't do it!
- Do not save source files as a combination of encodings. You will be touring.
- I canβt believe that files are stored in UTF-16 at all ...
So, I am confused by the fact that although people say "platform dependent", is this related to the source file?
Platform dependence means that it is potentially dependent on the operating system, vendor and version of the JVM, hardware, etc.
This is not necessarily related to the source file. (The encoding of any given source file may differ from the default encoding.)
If this is not so, how to explain the phenomena above? In any case, the confusion above extends my question to "so what happens after I compiled the source file into a class file, because the class file may not contain encoding information, so now the result really depends on the" platform ", but not from the source file? "
A platform-specific mechanism (such as environment variables) determines what the java compiler sees as the default character set. If you do not undo this action (for example, by providing parameters to the java compiler on the command line), this is what the Java compiler will use as the character set of the source file. However, this may not be the correct character encoding for the source files; for example, if you created them on another machine with a different default character set. And if the java compiler uses the wrong character set to decode your source files, it can put the wrong character codes in the ".class" files.
The ".class" files are platform independent. But if they were created incorrectly because you did not tell the Java compiler the correct encoding for the source files, the .class files will contain the wrong characters.
Why do you mean: "the concept of" default character set "for a single file is pointless"?
I say this because it is true!
The default character set means the character set that is used when you do not specify it.
But we can control how we want the text file to be saved correctly? Even using a notepad, you can choose between encoding.
It is right. And it's you TELLING Notepad which character set is used for the file. If you do not pronounce it, Notepad will use the default character set to write the file.
Notepad has a bit of black magic to guess what character encoding is when it reads a text file. Basically, he looks at the first few bytes of a file to see if it starts with a byte of UTF-16 byte. If he sees one, he can heuristically distinguish between UTF-16, UTF-8 (generated by Microscoft) and "other." But it cannot distinguish between different "other" character encodings and does not recognize a UTF-8 file that does not start with a specification marker. (The specification in the UTF-8 file is standard for Microsoft ... and causes problems if the Java application reads the file and does not know to skip the specification symbol.)
In any case, the problem is not in writing the source file. They occur when the Java compiler reads the source file with the wrong character encoding.