Is Python dangerous for working with binary files?

I read this in a Python tutorial: ( http://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files )

Python on Windows makes a distinction between text and binary; end-of-line characters in text files automatically change when data is read or written. This off-screen file modification data is great for ASCII text files, but it corrupts binary data like this in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files.

I don’t quite understand how “line breaks in text files are changed” will “corrupt binary data”. Since I believe that binary data does not have such things as end-of-line.

Can someone explain this paragraph to me? This makes me feel like Python does not welcome binaries.

+7
source share
3 answers

You just need to take care to open files on Windows as binary ( open(filename, "rb") ), and not as text files. After that, there is no problem using the data.

In particular, the end of a line in Windows is '\r\n' . And if you read the binary file as a text file and write it back, then single '\n' converted to the sequence '\r\n' . If you open files as binary (for reading and writing), there is no problem.

Python is capable of processing binary data, and you will have to take such care in any language on Windows systems, not just Python (but Python developers are friendly enough to warn you of possible problems with the OS). On systems such as Linux, where the end of the line is the only character, this difference also exists, but it is less likely that there will be a problem reading / writing binary data as text (i.e., without the b option to open files).

+14
source

I believe that binary data does not have such things as end-of-line.

Binary files can have ANY POSSIBLE character in them, including the \ n character. You do not want python to implicitly convert any characters in the binary to another. Python has no idea that it reads the binary unless you say so. And when python reads a text file, it will automatically convert any \ n character to the newline character of the OS, which in Windows is \ r \ n.

Thus, everything works in all programming languages.

Another way to think: a file is just a long series of bytes (8 bits). A byte is an integer. And the byte can be any integer. If the byte is an integer of 10, this is also the ascii code for the \ n character. If the bytes in the file represent binary data, you do not want Python to read at 10 and convert them to two bytes: 13 and 10. Usually, when you read binary data, you want to read, say, the first 2 bytes, which is a number and then the next 4 bytes, which are another number, etc. Obviously, if python suddenly converts one of the bytes into two bytes, this will cause two problems: 1) it modifies the data, 2) all your data boundaries will be corrupted.

Example: suppose the first byte of the file should represent the weight of the dog, and the byte value is 10. Then the next byte should represent the age of the dog, and its value is 1. If Python converts 10, which is the ascii code for \ n, up to two bytes : 10 and 13, then the python data will look like this:

10 13 1

And when you retrieve the second byte for the dog's age, you get 13 - not 1.

We often say that the file contains "characters", but this is clearly false. Computers cannot store characters; they can only store numbers. Thus, a file is simply a long series of numbers. If you tell python to treat these numbers as ascii codes that represent characters, then python will give you text.

+2
source

I suppose that “slightly changing” in the Python manual means converting Unix end-of-line characters to end-of-line end of line Windows. Because it is only done on Windows, so Unix and Linux do not have this problem.

+1
source

All Articles