Open () and codecs.open () in Python 2.7 behave strangely

Question

Open () and codecs.open () in Python 2.7 behave strangely

I have a text file with the first line of Unicode characters and all other lines in ASCII. I am trying to read the first line as one variable, and all the rest as another. However, when I use the following code:

# -*- coding: utf-8 -*- import codecs import os filename = '1.txt' f = codecs.open(filename, 'r3', encoding='utf-8') print f names_f = f.readline().split(' ') data_f = f.readlines() print len(names_f) print len(data_f) f.close() print 'And now for something completely differerent:' g = open(filename, 'r') names_g = g.readline().split(' ') print g data_g = g.readlines() print len(names_g) print len(data_g) g.close()

I get the following output:

 <open file '1.txt', mode 'rb' at 0x01235230> 28 7 And now for something completely differerent: <open file '1.txt', mode 'r' at 0x017875A0> 28 77

If I do not use readlines (), the entire file reads not only the first 7 lines in both codecs.open () and open ().

Why is this happening? And why is the codecs.open () file read in binary mode despite the 'r' parameter?

Update: this is the original file: http://www1.datafilehost.com/d/0792d687

+7

python python-2.7 file-io python-unicode codec

Kriattiffer Apr 21 '13 at 12:04

source share

1 answer

Martijn pieters · Accepted Answer · 2013-04-22T19:24:36+0000

Since you used .readline() , first the codecs.open() file codecs.open() with a line buffer; a subsequent call to .readlines() returns only buffer lines.

If you call .readlines() again, the rest of the lines are returned:

 >>> f = codecs.open(filename, 'r3', encoding='utf-8') >>> line = f.readline() >>> len(f.readlines()) 7 >>> len(f.readlines()) 71

A workaround should be to not mix .readline() and .readlines() :

 f = codecs.open(filename, 'r3', encoding='utf-8') data_f = f.readlines() names_f = data_f.pop(0).split(' ') # take the first line.

This behavior is indeed a mistake; Python developers are aware of this, see issue 8260 .

Another option is to use io.open() instead of codecs.open() ; the io library is what Python 3 uses to implement the built-in open() function and is much more reliable and versatile than the codecs module.

Open () and codecs.open () in Python 2.7 behave strangely

More articles: