When to open a file in binary mode (b)?

I noticed in the docs , they always open the CSV file using wb. Why "b? I know that b means binary mode, but when you use binary mode (Id guess the CSV file is not binary). If it is appropriate Im writing in CSV from the query results by arcpy.da.SearchCursor ()

EDIT: the wb+ response just marked according to this answer is used to write a binary file. What does + include?

+6
source share
6 answers

When opening a file, the default mode is text mode, which can convert characters \\\\\\\\\\\\\\\\\\\\\\\\\

In windows, this will change line breaks from "\ n" to "\ r \ n", which will create a problem with opening a CSV file in other applications / platforms.

Thus, when opening a binary file, you must add β€œb” to the mode value to open the file in binary mode, which will improve portability. On systems that do not have this difference, the addition of β€œb” has no effect.

Note: 'w +' truncates the file.

The modes 'r +', 'w +' and 'a +' open the file for updating (reading and writing).

As detailed here: https://docs.python.org/2/library/functions.html#open

+2
source

Use mode 'b' to read / write binary data without any conversion, such as converting newlines to / from platform values ​​or decoding / encoding text using character encoding.

csv is special. The CSV data is textual, and so text mode will be expected, but the csv module uses the default '\r\n' to complete lines on all platforms and it always recognizes both '\r' and '\n' as new lines . If you open the corresponding file in text mode ( universal newlines ), you will get '\r\r\n' (corrupted newlines) in Windows ( os.linesep == '\r\n' there). This is why Python 2 says you should use binary mode. Python 3 uses text mode, but you must pass newline='' to disable universal newlines . You would also like to disable universal newlines if you want to keep possible newlines (for example, '\r' ) embedded in fields.

+1
source

Using t in environments without Posix (for example, MSDOS and MS Windows), the sequence \r\n converted to \n at the input (and vice versa at the output). b (binary mode) does not perform such a translation.

Presumably, the CSV library deals with carriage returns (perhaps ignoring them every time it encounters them). A.


Edit: Just noticed a modified question.

Since .CSV files are not really intended for readers, the library can only output them with delimiters \n (linefeed (LF) aka newline). They would only be a real flaw for an MSWindows user opening a file with Notepad: it will not display well. The CSV library can also output files with \r\n (CR LF), since most programs protect against MSDOS conventions.

In any case, the library can write via b (binary) mode just fine. It is possible that if it is opened in t (text) mode, line separators will have something a bit strange like \r\n\n . Most CSV parsers probably ignore CR and recognize LF LF as the end of a line and follow it with an empty (empty) line, which is also ignored.

+ explained on the page:

w + strong> Open for reading and writing. A file is created if it does not exist, otherwise it is truncated. The stream located at the beginning of the file.

The difference is that w+ allows reading and writing, while w allows writing.

0
source

Never got a good explanation why I shouldn't just open ascii files in binary mode.

I have never seen a file open in binary mode to corrupt data.

I saw opening a file in ascii mode, changing or corrupting the returned data, ergo I and I assume that most experienced peony programmers will generally open files in binary mode, unless we have some guarantee that there will never be binary characters in the file.

0
source

Since opening a file in text mode redefines the processing of newlines in different ways based on the operating system for the main code, the authors of the CVS program should determine that they want more control - that they prefer to process newlines on their own. This may have allowed them to resolve errors due to inconsistencies that occurred when processing files under one OS that were created on another OS, where "reading text" in some unique cases distorted the problems. Perhaps errors were not found, but they want to avoid the possibility of the future. Or maybe it should be so, because in any case they have to deal with newline considerations, bypassing text processing can be faster.

Logically, since it is not possible to control the source of the OS of the read file, then using a binary file may be the best way for the general. However, when writing a text file, you can do everything possible to leave it in the main routines to process new lines for the current OS using text mode.

"+" is discussed in Confused in python file mode "w +"

0
source

For the Python csv module in particular, the answer is simple: it is required for documentation.

If csvfile is a file object, it should be open with the 'b flag on platforms where it matters.

Source: https://docs.python.org/2.7/library/csv.html#csv.reader

0
source

All Articles