Use readlines () with indexes or syntax lines on the fly?

Question

Use readlines () with indexes or syntax lines on the fly?

I am making a simple test function that claims that the output from the interpreter that I am developing is correct by reading from the file an expression for the evaluation and the expected result, similar to the python doctrine. This is for the circuit, so an example input file will be

> 42 42 > (+ 1 2 3) 6

My first attempt for a function that can parse such a file is as follows: it works as expected:

 def run_test(filename): interp = Interpreter() response_next = False num_tests = 0 with open(filename) as f: for line in f: if response_next: assert response == line.rstrip('\n') response_next = False elif line.startswith('> '): num_tests += 1 response = interp.eval(line[2:]) response = str(response) if response else '' response_next = True print "{:20} Ran {} tests successfully".format(os.path.basename(filename), num_tests)

I wanted to improve it a bit by removing the response_next flag, since I am not a fan of such flags and instead read the next line in the elif block with next(f) . I had a little unrelated question regarding what I asked on IRC on freenode. I got the help I wanted, but I was also asked to use f.readlines() instead, and then use indexing in the resulting list. (I was also told that I can use groupby() in itertools for paired strings, but I will explore this approach later.)

Now, to the question, I was very curious why this approach would be better, but my Internet connection was a whisper in the train, and I could not ask, so I will ask for it here. Why is it better to read everything with readlines() instead of parsing each line when they are read on the fly?

I really wonder how my feelings are opposite, I think it seems clean to parse the lines one at a time, so that everything is finished at a time. I usually avoid using indexes in arrays in Python and prefer working with iterators and generators. It may not be possible to answer and guess what the person thought if it was a subjective opinion, but if there is any general recommendation, I would be glad to hear about it.

+4

python

Michael brennan Jul 11 '12 at 12:42

source share

3 answers

Of course, Pythonic handles input more iteratively, rather than immediately reading the entire input; for example, this will work if the input is a console.

The argument for reading the whole array and indexing is that using next(f) may be unclear when combined with a for loop; then the for loops would be replaced with while True or fully document that you call next on f in the loop:

 try: while True: test = next(f) response = next(f) except StopIteration: pass

As Jonas suggests you do this (if you are sure that the input will always consist of a test / answer / test / answer string, etc.), locking the input by itself:

 for test, response in zip(f, f): # Python 3 for test, response in itertools.izip(f, f): # Python 2

+1

ecatmur Jul 11 '12 at 12:58

source share

 from itertools import ifilter,imap def run_test(filename): interp = Interpreter() num_tests, num_passed, last_result = 0, 0, None with open(filename) as f: # iterate over non-blank lines for line in ifilter(None, imap(str.strip, f)): if line.startswith('> '): last_result = interp.eval(line[2:]) else: num_tests += 1 try: assert line == repr(last_test_result) except AssertionError, e: print e.message else: num_passed += 1 print("Ran {} tests, {} passed".format(num_tests, num_passed))

... it just assumes that any line of the result refers to the previous test.

I would avoid .readlines () unless you get some benefit from having the whole file at once.

I also changed the comparison to look at the presentation of the result so that it can distinguish between output types, i.e.

 '6' + '2' > '62' 60 + 2 > 62

0

Hugh bothwell Jul 11 '12 at 15:49

source share

alexis · Accepted Answer · 2012-07-11T16:30:24+0000

Reading everything into an array gives you the equivalent of random access: you use the index of the array to move around the array, and at any time you can check what happens next and back up if necessary.

If you can perform your task without creating a backup, you do not need random access, and it would be easier to do without it. In your examples, it seems that your syntax is always a single-line (?) Expression, followed by the expected response. So, I would write a top-level loop that repeats once for an expression-value pair, reading lines as needed. If you want to support multi-line expressions and results, you can write separate functions for reading each of them: one that reads the full expression that reads the result (up to the next empty line). The important thing is that they should be able to consume as much input as they need and leave the input pointer in a reasonable state for the next input.

Use readlines () with indexes or syntax lines on the fly?

More articles: