Opening a file in text mode can lead to data loss in Python: why?

The documentation for codecs.open() mentions that

Files always open in binary mode, even if binary mode is not specified. This is to prevent data loss due to encoding using 8-bit values.

How does using text mode for a file lead to "data loss"? It seems that opening a file in text mode can truncate bytes to 7 bits, but I cannot find mention of this in the documentation: text mode is described only as a way to convert new lines, without mentioning of any potential data loss. So, what about the documentation for codecs.open() ?

PS . Although it is clear that the automatic conversion of a newline to a platform-specific encoding of a newline requires some caution, the question is specifically about 8-bit encodings. I would suggest that only some encodings are compatible with automatic newline conversion, regardless of whether they are 8- or 7-bit encodings. So why are 8-bit encodings highlighted in the codecs.open() documentation?

+4
source share
1 answer

I think they mean that some encodings use all 8 bits, at least in some bytes, so that all 256 values ​​are possible (and, in particular, it is possible to get 0x0A or 0x0D that do not mean CR or LF).

In contrast, in the UTF-8 file, the characters CR and LF (and all other characters below 0x80) always translate to themselves. They cannot be part of the coding of any other character.

+5
source

All Articles