Where is the character encoding of a text file stored in Linux?

I know that the short answer should be “nowhere”, however there is something that doesn't quite resemble the next test 2.

Test 1. In Gedit, I create a new file containing only the string “aàbï”, I select “Save As” and there is a selector for choosing the character encoding. So I save it as "Unicode (UTF-8)", then repeat the same thing, and I save it in another file as "ISO-8859-15". The first file is 7 bytes in size (2 1-byte characters, 2 2-byte characters and LF at the end of the file, as the hex dump shows). The second file has a size of 5 bytes (4 1-byte characters in Latin encoding plus LF). This indicates that the encoding is not stored anywhere in the file. Apparently, when I open the file in Gedit and it decodes it correctly, it should figure out how to decode it by analyzing the contents.

Test2 . I am doing the same as above, but this time the contents of the file are just "abcd", that is, four ascii characters. Two saved files have the same size (5 bytes) and identical hex dumps . It seems that the two files are identical, indistinguishable, therefore, again, it seems that no encoding information is included in the files.

However, when I open the two test 2 files again in Gedit, and I go to Save As, the encoding into which the file was saved is selected. Gedit might somehow say that one file was encoded in UTF-8 and the other in ISO-8859-15, although both contain only ascii characters that result in the same sequence of bytes, and they seem to be the same. Like this?

- ? Gedit, , ( ) ?

P.S. , , , , , , , , , .

+4
2

. , , abcd, , abcd ASCII .

. ( ) ext , gedit, -, . , ( root ) . Gedit , , .

+6

. , , . Test1 , :

  • , UTF-8, gedit UTF-8
  • ISO-8859-15 , UTF-8, gedit ISO-8859-x
  • ISO-8859-15 (, ISO-8859-1) ; .
  • , , gedit , gedit, .

Test2 ASCII ( UTF-8, ISO-8859-15), : gedit , UTF-8 .

:

0

All Articles