Difference between open and codecs.open in Python

There are two ways to open a text file in Python:

f = open(filename) 

and

 import codecs f = codecs.open(filename, encoding="utf-8") 

When is codecs.open preferable to open ?

+72
python unicode codec
Mar 09 '11 at 18:56
source share
7 answers

With Python 2.6, it is good practice to use io.open() , which also accepts the encoding argument, for example, the now deprecated codecs.open() . In Python 3, io.open is an alias for the built-in open() . So, io.open() works in Python 2.6 and all subsequent versions, including Python 3.4. See Docs: http://docs.python.org/3.4/library/io.html

Now, for the original question: when reading text (including plain text, HTML, XML, and JSON) in Python 2, you should always use io.open() with explicit encoding or open() with explicit encoding in Python 3. This means that you correctly decoded Unicode or get an error right off the bat, which makes debugging easier.

Pure ASCII "plain text" is a myth from the distant past. Correct English text uses curly quotes, em dashes, markers, euros (euro signs), and even diaresis (¨). Do not be naive! (And do not forget about the facade design template!)

Since pure ASCII is not a real option, open() without explicit encoding is only useful to read binary files.

+71
Mar 09 '14 at 10:13
source share

Personally, I always use codecs.open unless there is a clearly defined need to use open **. The reason is because there were so many times when I was bitten when utf-8 input got into my programs. "Oh, I just know that it will always be ascii", usually an assumption that often breaks.

Assuming "utf-8" since default encoding tends to be a safer default choice in my experience, since ASCII can be thought of as UTF-8, but the converse is not true. And in cases where I really know that the input is ASCII, I still do codecs.open , because I firmly believe in "Explicit is better than implicit . "

** - in Python 2.x, since the comment on the question in Python 3 open replaces codecs.open

+17
Nov 01
source share

Python 2 has unicode and bytestrings strings. If you just use bytestrings, you can read / write to a file opened with open() , just fine. After all, strings are just bytes.

The problem arises when, say, you have a string in Unicode and you do the following:

 >>> example = u'Μου αρέσει Ελληνικά' >>> open('sample.txt', 'w').write(example) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) 

So, it’s obvious that you are either explicitly encoding the unicode string in utf-8, or using codecs.open to do this transparently for you.

If you only ever use bytestrings, then no problem:

 >>> example = 'Μου αρέσει Ελληνικά' >>> open('sample.txt', 'w').write(example) >>> 

It becomes more attractive than this, because when you combine a unicode string and a bytestring using the + operator, you get a unicode string. Easy to bite him.

Also, codecs.open does not like bytes with non-ASCII characters:

 codecs.open('test', 'w', encoding='utf-8').write('Μου αρέσει') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/codecs.py", line 691, in write return self.writer.write(data) File "/usr/lib/python2.7/codecs.py", line 351, in write data, consumed = self.encode(object, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128) 

The advice on I / O strings is usually "converted to Unicode as early as possible and back to bytes as late as possible." Using codecs.open makes the latter very easy.

Just be careful, you specify unicode strings, not tags that may have non-ASCII characters.

+8
Nov 12 '13 at 18:16
source share

When you need to open a file with a specific encoding, you should use the codecs module.

+6
Mar 09 '11 at 18:57
source share

codecs.open , codecs.open is just the remainder of Python 2 codecs.open Python 2 days, when the built-in open interface had a much simpler interface and fewer features. In Python 2, the built-in open does not accept an encoding argument, so if you wanted to use something other than binary mode or the default encoding, codecs.open should be used.

In Python 2.6 , the io module came to the rescue to make things a little easier. According to official documentation

 New in version 2.6. The io module provides the Python interfaces to stream handling. Under Python 2.x, this is proposed as an alternative to the built-in file object, but in Python 3.x it is the default interface to access files and streams. 

Having said that, the only use I can come up with for codecs.open in the current scenario is backward compatibility. In all other scripts (if you are not using Python <2.6), it is preferable to use io.open . Also in Python 3.x io.open is the same as built-in open

Remarks:

There is also a syntactic difference between codecs.open and io.open .

codecs.open :

 open(filename, mode='rb', encoding=None, errors='strict', buffering=1) 

io.open :

 open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None) 
+5
Dec 12 '18 at 13:08
source share

When you work with text files and want transparent encoding and decoding into Unicode objects.

+3
Mar 09 '11 at 18:59
source share
  • If you want to download the binary, use f = io.open(filename, 'b') .

  • To open a text file, always use f = io.open(filename, encoding='utf-8') with explicit encoding.

However, in python 3, open does the same thing as io.open and can be used instead.

Note: codecs.open is planned to become deprecated and replaced by io.open after its introduction in Python 2.6. I would use it only if the code should be compatible with earlier versions of Python. For more information about codecs and Unicode in Python, see the Unicode HOWTO .

+1
Sep 11 '18 at 21:57
source share



All Articles