How can I read two lines from a file at a time using python

Question

How can I read two lines from a file at a time using python

I am encoding a python script that parses a text file. The format of this text file is such that each element in the file uses two lines, and for convenience I would like to read both lines before parsing. Can this be done in Python?

I would like something like:

f = open(filename, "r") for line in f: line1 = line line2 = f.readline() f.close

But this breaks down, saying that:

ValueError: mixing iterations and reading methods will lose data

Connected:

What is the most "pythonic" way to iterate over a list in pieces?

+70

python

Daniel Nov 01 '09 at 14:27

source share

14 answers

 import itertools with open('a') as f: for line1,line2 in itertools.zip_longest(*[f]*2): print(line1,line2)

itertools.zip_longest() returns an iterator, so it will work well even if the file is billions of lines long.

If there is an odd number of lines, then line2 is set to None in the last iteration.

In Python2, you need to use instead of izip_longest .

The comments asked if this solution reads the entire file first, and then repeats the file a second time. I believe that this is not so. The line with open('a') as f opens the file descriptor but does not read the file. f is an iterator, so its contents are not read until requested. zip_longest takes iterators as arguments and returns an iterator.

zip_longest really gets the same iterator f twice. But in the end, what happens is that next(f) is called for the first argument, and then for the second argument. Since next() is called on the same base iterator, consecutive lines are obtained. This is very different from reading throughout the file. In fact, the purpose of using iterators is to avoid reading the entire file.

Therefore, I believe that the solution works as it should - the file is read only once by the for loop.

To confirm this, I ran the zip_longest solution instead of the solution using f.readlines() . I put input() at the end to pause scripting, and ran ps axuw for each:

 % ps axuw | grep zip_longest_method.py

unutbu 11119 2.2 0.2 4520 2712 pts/0 S+ 21:14 0:00 python/home/unutbu/pybin/zip_longest_method.py bigfile

 % ps axuw | grep readlines_method.py

unutbu 11317 6.5 8.8 93908 91680 points pts/0 S+ 21:16 0:00 python/home/unutbu/pybin/readlines_method.py bigfile

readlines clearly reads the entire file at once. Since zip_longest_method uses much less memory, I think we can safely conclude that it does not read the entire file at once.

+38

unutbu Nov 01 '09 at 15:12

source share

use the string .next () e.g.

 f=open("file") for line in f: print line nextline=f.next() print "next line", nextline .... f.close()

+21

ghostdog74 Nov 01 '09 at 14:41

source share

I would continue like ghostdog74 , only when trying from the outside and several modifications:

 try: with open(filename) as f: for line1 in f: line2 = f.next() # process line1 and line2 here except StopIteration: print "(End)" # do whatever you need to do with line1 alone

This makes the code simple and reliable. Using with closes the file if something else happens, or just closes the resources as soon as you have exhausted it, and exit the loop.

Note that with requires 2.6 or 2.5 with the with_statement function with_statement .

+10

RedGlyph Nov 01. '09 at 4:00 p.m.

source share

how about this, does anyone see a problem with him

 f=open('file_name') for line,line2 in zip(f,f): print line,line2

+4

svural May 24 '12 at 15:35

source share

Works for files of even and odd length. It simply ignores the unsurpassed last line.

 f=file("file") lines = f.readlines() for even, odd in zip(lines[0::2], lines[1::2]): print "even : ", even print "odd : ", odd print "end cycle" f.close()

If you have large files, this is the wrong approach. You load the entire file into memory using readlines (). I once wrote a class that read a file while maintaining the fseek position of each beginning of the line. This allows you to retrieve specific lines without having the entire file in memory, and you can also go back and forth.

I insert it here. The license is publicly available, that is, to do what you want with it. Please note that this class was written 6 years ago, and since then I have not touched or tested it. I think it is not even compatible with files. Let the buyer be vigilant. Also, note that this is too complicated for your problem. I am not saying that you definitely need to go this route, but I had this code, and I like to share it if you need more complex access.

 import string import re class FileReader: """ Similar to file class, but allows to access smoothly the lines as when using readlines(), with no memory payload, going back and forth, finding regexps and so on. """ def __init__(self,filename): # fold>> self.__file=file(filename,"r") self.__currentPos=-1 # get file length self.__file.seek(0,0) counter=0 line=self.__file.readline() while line != '': counter = counter + 1 line=self.__file.readline() self.__length = counter # collect an index of filedescriptor positions against # the line number, to enhance search self.__file.seek(0,0) self.__lineToFseek = [] while True: cur=self.__file.tell() line=self.__file.readline() # if it not null the cur is valid for # identifying a line, so store self.__lineToFseek.append(cur) if line == '': break # <<fold def __len__(self): # fold>> """ member function for the operator len() returns the file length FIXME: better get it once when opening file """ return self.__length # <<fold def __getitem__(self,key): # fold>> """ gives the "key" line. The syntax is import FileReader f=FileReader.FileReader("a_file") line=f[2] to get the second line from the file. The internal pointer is set to the key line """ mylen = self.__len__() if key < 0: self.__currentPos = -1 return '' elif key > mylen: self.__currentPos = mylen return '' self.__file.seek(self.__lineToFseek[key],0) counter=0 line = self.__file.readline() self.__currentPos = key return line # <<fold def next(self): # fold>> if self.isAtEOF(): raise StopIteration return self.readline() # <<fold def __iter__(self): # fold>> return self # <<fold def readline(self): # fold>> """ read a line forward from the current cursor position. returns the line or an empty string when at EOF """ return self.__getitem__(self.__currentPos+1) # <<fold def readbackline(self): # fold>> """ read a line backward from the current cursor position. returns the line or an empty string when at Beginning of file. """ return self.__getitem__(self.__currentPos-1) # <<fold def currentLine(self): # fold>> """ gives the line at the current cursor position """ return self.__getitem__(self.__currentPos) # <<fold def currentPos(self): # fold>> """ return the current position (line) in the file or -1 if the cursor is at the beginning of the file or len(self) if it at the end of file """ return self.__currentPos # <<fold def toBOF(self): # fold>> """ go to beginning of file """ self.__getitem__(-1) # <<fold def toEOF(self): # fold>> """ go to end of file """ self.__getitem__(self.__len__()) # <<fold def toPos(self,key): # fold>> """ go to the specified line """ self.__getitem__(key) # <<fold def isAtEOF(self): # fold>> return self.__currentPos == self.__len__() # <<fold def isAtBOF(self): # fold>> return self.__currentPos == -1 # <<fold def isAtPos(self,key): # fold>> return self.__currentPos == key # <<fold def findString(self, thestring, count=1, backward=0): # fold>> """ find the count occurrence of the string str in the file and return the line catched. The internal cursor is placed at the same line. backward is the searching flow. For example, to search for the first occurrence of "hello starting from the beginning of the file do: import FileReader f=FileReader.FileReader("a_file") f.toBOF() f.findString("hello",1,0) To search the second occurrence string from the end of the file in backward movement do: f.toEOF() f.findString("hello",2,1) to search the first occurrence from a given (or current) position say line 150, going forward in the file f.toPos(150) f.findString("hello",1,0) return the string where the occurrence is found, or an empty string if nothing is found. The internal counter is placed at the corresponding line number, if the string was found. In other case, it set at BOF if the search was backward, and at EOF if the search was forward. NB: the current line is never evaluated. This is a feature, since we can so traverse occurrences with a line=f.findString("hello") while line == '': line.findString("hello") instead of playing with a readline every time to skip the current line. """ internalcounter=1 if count < 1: count = 1 while 1: if backward == 0: line=self.readline() else: line=self.readbackline() if line == '': return '' if string.find(line,thestring) != -1 : if count == internalcounter: return line else: internalcounter = internalcounter + 1 # <<fold def findRegexp(self, theregexp, count=1, backward=0): # fold>> """ find the count occurrence of the regexp in the file and return the line catched. The internal cursor is placed at the same line. backward is the searching flow. You need to pass a regexp string as theregexp. returns a tuple. The fist element is the matched line. The subsequent elements contains the matched groups, if any. If no match returns None """ rx=re.compile(theregexp) internalcounter=1 if count < 1: count = 1 while 1: if backward == 0: line=self.readline() else: line=self.readbackline() if line == '': return None m=rx.search(line) if m != None : if count == internalcounter: return (line,)+m.groups() else: internalcounter = internalcounter + 1 # <<fold def skipLines(self,key): # fold>> """ skip a given number of lines. Key can be negative to skip backward. Return the last line read. Please note that skipLines(1) is equivalent to readline() skipLines(-1) is equivalent to readbackline() and skipLines(0) is equivalent to currentLine() """ return self.__getitem__(self.__currentPos+key) # <<fold def occurrences(self,thestring,backward=0): # fold>> """ count how many occurrences of str are found from the current position (current line excluded... see skipLines()) to the begin (or end) of file. returns a list of positions where each occurrence is found, in the same order found reading the file. Leaves unaltered the cursor position. """ curpos=self.currentPos() list = [] line = self.findString(thestring,1,backward) while line != '': list.append(self.currentPos()) line = self.findString(thestring,1,backward) self.toPos(curpos) return list # <<fold def close(self): # fold>> self.__file.close() # <<fold

+3

Stefano Borini Nov 01 '09 at 14:43

source share

 file_name = 'your_file_name'
 file_open = open (file_name, 'r')

 def handler (line_one, line_two):
     print (line_one, line_two)

 while file_open:
     try:
         one = file_open.next ()
         two = file_open.next () 
         handler (one, two)
     except (StopIteration):
         file_open.close ()
         break

+2

Martin P. Hellwig Nov 01 '09 at 14:45

source share

 def readnumlines(file, num=2): f = iter(file) while True: lines = [None] * num for i in range(num): try: lines[i] = f.next() except StopIteration: # EOF or not enough lines available return yield lines # use like this f = open("thefile.txt", "r") for line1, line2 in readnumlines(f): # do something with line1 and line2 # or for line1, line2, line3, ..., lineN in readnumlines(f, N): # do something with N lines

+2

Georg Schölly Nov 01 '09 at 15:33

source share

 f = open(filename, "r") for line in f: line1 = line f.next() f.close

Right now you can read the file every two lines. If you like, you can also check f status before f.next()

+1

Kimmi May 29 '13 at 16:03

source share

My idea is to create a generator that reads two lines from a file at a time, and returns this as a 2-tuple. This means that you can iterate over the results.

 from cStringIO import StringIO def read_2_lines(src): while True: line1 = src.readline() if not line1: break line2 = src.readline() if not line2: break yield (line1, line2) data = StringIO("line1\nline2\nline3\nline4\n") for read in read_2_lines(data): print read

If you have an odd number of lines, this will not work fine, but this should give you a good outline.

0

Simon Callan Nov 01 '09 at 14:46

source share

Last month I worked on a similar issue. I tried the while loop with f.readline () as well as f.readlines (). My data file is not huge, so I finally chose f.readlines (), which gives me more control over the index, otherwise I need to use f.seek () to move the file pointer forward and backward.

My case is more complicated than OP. Since my data file is more flexible, how many lines do I need to parse each time, so I have to check a few conditions before I can parse the data.

Another problem that I learned about f.seek () is that it does not handle utf-8 very well when I use codecs.open ('', 'r', 'utf-8'), ( not quite sure about the culprit, in the end I abandoned this approach.)

0

Dingle Nov 01 '09 at 18:36

source share

Simple little reader. It will draw lines in pairs of two and return them as a tuple when you iterate over an object. You can close it manually or close it when it falls out of the area.

 class doublereader: def __init__(self,filename): self.f = open(filename, 'r') def __iter__(self): return self def next(self): return self.f.next(), self.f.next() def close(self): if not self.f.closed: self.f.close() def __del__(self): self.close() #example usage one r = doublereader(r"C:\file.txt") for a, h in r: print "x:%s\ny:%s" % (a,h) r.close() #example usage two for x,y in doublereader(r"C:\file.txt"): print "x:%s\ny:%s" % (x,y) #closes itself as soon as the loop goes out of scope

0

Bo Buchanan Feb 28 2018-11-28T00:

source share

If the file is of reasonable size, another approach that uses list-comprehension to read the entire file into a list of 2 tuples is as follows:

 filaname = '/path/to/file/name' with open(filename, 'r') as f: list_of_2tuples = [ (line,f.readline()) for line in f ] for (line1,line2) in list_of_2tuples: # Work with them in pairs. print('%s :: %s', (line1,line2))

0

prismalytics.io Jun 29 '17 at 4:05

source share

This Python code will print the first two lines:

 import linecache filename = "ooxx.txt" print(linecache.getline(filename,2))

-2

Timothy.hmchen Nov 01 '09 at 14:56

source share

robince · Accepted Answer · 2009-11-01 14:35

A similar question is here . You cannot mix iteration and readline, so you need to use one or the other.

 while True: line1 = f.readline() line2 = f.readline() if not line2: break # EOF ...

How can I read two lines from a file at a time using python

Connected:

More articles: