End Generator Length

Question

End Generator Length

I have these two implementations to calculate the length of the final generator, saving data for further processing:

def count_generator1(generator): '''- build a list with the generator data - get the length of the data - return both the length and the original data (in a list) WARNING: the memory use is unbounded, and infinite generators will block this''' l = list(generator) return len(l), l def count_generator2(generator): '''- get two generators from the original generator - get the length of the data from one of them - return both the length and the original data, as returned by tee WARNING: tee can use up an unbounded amount of memory, and infinite generators will block this''' for_length, saved = itertools.tee(generator, 2) return sum(1 for _ in for_length), saved

Both have flaws, both do the job. Can someone comment on them or even suggest a better alternative?

+6

python generator

dangonfast Aug 2 '13 at 10:14

source share

2 answers

I ran the Windows 64-bit Python 3.4.3 timeit on several approaches that I might think of:

 >>> from timeit import timeit >>> from textwrap import dedent as d >>> timeit( ... d(""" ... count = -1 ... for _ in s: ... count += 1 ... count += 1 ... """), ... "s = range(1000)", ... ) 50.70772041983173 >>> timeit( ... d(""" ... count = -1 ... for count, _ in enumerate(s): ... pass ... count += 1 ... """), ... "s = range(1000)", ... ) 42.636973504498656 >>> timeit( ... d(""" ... count, _ = reduce(f, enumerate(range(1000)), (-1, -1)) ... count += 1 ... """), ... d(""" ... from functools import reduce ... def f(_, count): ... return count ... s = range(1000) ... """), ... ) 121.15513102540672 >>> timeit("count = sum(1 for _ in s)", "s = range(1000)") 58.179126025925825 >>> timeit("count = len(tuple(s))", "s = range(1000)") 19.777029680237774 >>> timeit("count = len(list(s))", "s = range(1000)") 18.145157531932 >>> timeit("count = len(list(1 for _ in s))", "s = range(1000)") 57.41422175998332

Horribly, the fastest approach was to use list (not even < tuple ) to retrieve the iterator and get the length from there:

 >>> timeit("count = len(list(s))", "s = range(1000)") 18.145157531932

Of course, this is due to memory problems. The best low memory alternative was to use an enum on NOOP for -loop:

 >>> timeit( ... d(""" ... count = -1 ... for count, _ in enumerate(s): ... pass ... count += 1 ... """), ... "s = range(1000)", ... ) 42.636973504498656

Hurrah!

+2

John Crawford Jul 10 '15 at 21:17

source share

Gareth Latty · Accepted Answer · 2013-08-02 10:18

If you need to do this, the first way is much better - since you consume all the values, itertools.tee() will store all the values anyway, which means the list will be more efficient.

To quote documents :

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one Iterator uses most or all of the data before starting another Iterator, it is faster to use list () instead of tee ().

End Generator Length

More articles: