Python 2 has unicode and bytestrings strings. If you just use bytestrings, you can read / write to a file opened with open() , just fine. After all, strings are just bytes.
The problem arises when, say, you have a string in Unicode and you do the following:
>>> example = u'Μου αρέσει Ελληνικά' >>> open('sample.txt', 'w').write(example) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
So, it’s obvious that you are either explicitly encoding the unicode string in utf-8, or using codecs.open to do this transparently for you.
If you only ever use bytestrings, then no problem:
>>> example = 'Μου αρέσει Ελληνικά' >>> open('sample.txt', 'w').write(example) >>>
It becomes more attractive than this, because when you combine a unicode string and a bytestring using the + operator, you get a unicode string. Easy to bite him.
Also, codecs.open does not like bytes with non-ASCII characters:
codecs.open('test', 'w', encoding='utf-8').write('Μου αρέσει') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/codecs.py", line 691, in write return self.writer.write(data) File "/usr/lib/python2.7/codecs.py", line 351, in write data, consumed = self.encode(object, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)
The advice on I / O strings is usually "converted to Unicode as early as possible and back to bytes as late as possible." Using codecs.open makes the latter very easy.
Just be careful, you specify unicode strings, not tags that may have non-ASCII characters.