Here are three options:
foo = """ this is a multi-line string. """ def f1(foo=foo): return iter(foo.splitlines()) def f2(foo=foo): retval = '' for char in foo: retval += char if not char == '\n' else '' if char == '\n': yield retval retval = '' if retval: yield retval def f3(foo=foo): prevnl = -1 while True: nextnl = foo.find('\n', prevnl + 1) if nextnl < 0: break yield foo[prevnl + 1:nextnl] prevnl = nextnl if __name__ == '__main__': for f in f1, f2, f3: print list(f())
Running this as the main script confirms that the three functions are equivalent. With timeit (and a * 100 for foo to get significant strings for a more accurate measurement):
$ python -mtimeit -s'import asp' 'list(asp.f3())' 1000 loops, best of 3: 370 usec per loop $ python -mtimeit -s'import asp' 'list(asp.f2())' 1000 loops, best of 3: 1.36 msec per loop $ python -mtimeit -s'import asp' 'list(asp.f1())' 10000 loops, best of 3: 61.5 usec per loop
Note that we need a list() call to ensure that iterators are passed, not just built.
IOW, the naive implementation is much faster, it’s not even funny: 6 times faster than my attempt with find calls, which in turn is 4 times faster than the lower level approach.
Conservation lessons: measurement is always good (but must be accurate); string methods such as splitlines are implemented in very fast ways; putting the lines together, programming at a very low level (especially loops += very small parts) can be quite slow.
Edit : the @Jacob sentence has been added, slightly modified to give the same results as the others (spaces remain in the line), i.e.
from cStringIO import StringIO def f4(foo=foo): stri = StringIO(foo) while True: nl = stri.readline() if nl != '': yield nl.strip('\n') else: raise StopIteration
Measurement gives:
$ python -mtimeit -s'import asp' 'list(asp.f4())' 1000 loops, best of 3: 406 usec per loop
not as good as the .find -based .find - it’s still worth .find in mind, because it may be less prone to minor errors in turn (any cycle in which you see the occurrences of +1 and -1, like my f3 above, should automatically launch suspicious suspicions - and therefore many cycles that do not have such settings and should have them, although I believe that my code is also right, because I was able to check its output using other functions ").
But the separation approach is still in place.
Aside: perhaps the best style for f4 would be:
from cStringIO import StringIO def f4(foo=foo): stri = StringIO(foo) while True: nl = stri.readline() if nl == '': break yield nl.strip('\n')
at least it's a little less verbose. Obviously, the need to separate trailing \n prohibits a clearer and faster replacement of the while return iter(stri) (the part of iter that is redundant in modern versions of Python, I believe, starting from 2.3 or 2.4, but it is also harmless). It might be worth a try as well:
return itertools.imap(lambda s: s.strip('\n'), stri)
or their variations - but I stop here, as this is a rather theoretical exercise based on strip , the simplest and fastest, one.