Python 3: gzip.open () and modes

https://docs.python.org/3/library/gzip.html

I am considering using gzip.open() , and I'm a bit confused by the mode argument:

The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x' or 'xb' for binary mode, or 'rt', 'at', 'wt' or 'xt' for text mode. The default value is "rb".

What is the difference between 'w' and 'wb' ?

The document states that they are both binary.

Does this mean that there is no difference between 'w' and 'wb' ?

+3
python gzip mode
Feb 02 '17 at 21:57
source share
2 answers

This means that r is rb by default, and if you want text, you must specify it with rt .

(unlike open behavior, where r means rt , not rb )

+4
Feb 02 '17 at 22:02
source share

Exactly how you say and as already described @

Answer Jean-François Fabre.
I just wanted to show some code, as it was fun.
Let's take a look at the source code of gzip.py in the python library to see what is happening efficiently.
gzip.open() can be found here https://github.com/python/cpython/blob/master/Lib/gzip.py and I report below

 def open(filename, mode="rb", compresslevel=9, encoding=None, errors=None, newline=None): """Open a gzip-compressed file in binary or text mode. The filename argument can be an actual filename (a str or bytes object), or an existing file object to read from or write to. The mode argument can be "r", "rb", "w", "wb", "x", "xb", "a" or "ab" for binary mode, or "rt", "wt", "xt" or "at" for text mode. The default mode is "rb", and the default compresslevel is 9. For binary mode, this function is equivalent to the GzipFile constructor: GzipFile(filename, mode, compresslevel). In this case, the encoding, errors and newline arguments must not be provided. For text mode, a GzipFile object is created, and wrapped in an io.TextIOWrapper instance with the specified encoding, error handling behavior, and line ending(s). """ if "t" in mode: if "b" in mode: raise ValueError("Invalid mode: %r" % (mode,)) else: if encoding is not None: raise ValueError("Argument 'encoding' not supported in binary mode") if errors is not None: raise ValueError("Argument 'errors' not supported in binary mode") if newline is not None: raise ValueError("Argument 'newline' not supported in binary mode") gz_mode = mode.replace("t", "") if isinstance(filename, (str, bytes, os.PathLike)): binary_file = GzipFile(filename, gz_mode, compresslevel) elif hasattr(filename, "read") or hasattr(filename, "write"): binary_file = GzipFile(None, gz_mode, compresslevel, filename) else: raise TypeError("filename must be a str or bytes object, or a file") if "t" in mode: return io.TextIOWrapper(binary_file, encoding, errors, newline) else: return binary_file 

A little notice:

  • The default mode is rb , as in the documentation you are reporting
  • to open a binary file, he does not care, for example, "r", "rb", "w", "wb" .
    We see this in the following lines:

     gz_mode = mode.replace("t", "") if isinstance(filename, (str, bytes, os.PathLike)): binary_file = GzipFile(filename, gz_mode, compresslevel) elif hasattr(filename, "read") or hasattr(filename, "write"): binary_file = GzipFile(None, gz_mode, compresslevel, filename) else: raise TypeError("filename must be a str or bytes object, or a file") if "t" in mode: return io.TextIOWrapper(binary_file, encoding, errors, newline) else: return binary_file 

    basically the binary_file is created in such a way that extra b or not, since gz_mode can have b or not at the moment.
    Now the class class GzipFile(_compression.BaseStream) is called to build binary_file .

The following lines are important in the constructor:

  if mode and ('t' in mode or 'U' in mode): raise ValueError("Invalid mode: {!r}".format(mode)) if mode and 'b' not in mode: mode += 'b' if fileobj is None: fileobj = self.myfileobj = builtins.open(filename, mode or 'rb') if filename is None: filename = getattr(fileobj, 'name', '') if not isinstance(filename, (str, bytes)): filename = '' else: filename = os.fspath(filename) if mode is None: mode = getattr(fileobj, 'mode', 'rb') 

where it is clearly seen that if 'b' not in mode, it will be added

 if mode and 'b' not in mode: mode += 'b' 

so that there are no differences between the two modes, as already discussed.

+3
Feb 02 '17 at 22:34
source share



All Articles