Opening a file in text mode can lead to data loss in Python: why?

Question

Opening a file in text mode can lead to data loss in Python: why?

The documentation for codecs.open() mentions that

Files always open in binary mode, even if binary mode is not specified. This is to prevent data loss due to encoding using 8-bit values.

How does using text mode for a file lead to "data loss"? It seems that opening a file in text mode can truncate bytes to 7 bits, but I cannot find mention of this in the documentation: text mode is described only as a way to convert new lines, without mentioning of any potential data loss. So, what about the documentation for codecs.open() ?

PS . Although it is clear that the automatic conversion of a newline to a platform-specific encoding of a newline requires some caution, the question is specifically about 8-bit encodings. I would suggest that only some encodings are compatible with automatic newline conversion, regardless of whether they are 8- or 7-bit encodings. So why are 8-bit encodings highlighted in the codecs.open() documentation?

+4

python codec 8bit 7bit

Eol May 17, '11 at 20:25

source share

1 answer

Igor Nazarenko · Answer 1 · 2011-05-17T20:35:24+0000

I think they mean that some encodings use all 8 bits, at least in some bytes, so that all 256 values are possible (and, in particular, it is possible to get 0x0A or 0x0D that do not mean CR or LF).

In contrast, in the UTF-8 file, the characters CR and LF (and all other characters below 0x80) always translate to themselves. They cannot be part of the coding of any other character.

Opening a file in text mode can lead to data loss in Python: why?

More articles: