Getting the number of elements in an iterator in Python

Is there an effective way to find out how many elements are in an iterator in Python, in general, without repeating each of them and counting?

+70
python iterator
Jul 27 '10 at 16:32
source share
14 answers

No. It's impossible.

Example:

import random def gen(n): for i in xrange(n): if random.randint(0, 1) == 0: yield i iterator = gen(10) 

The length of the iterator unknown until you pass through it.

+59
Jul 27 '10 at 16:42 on
source share

This code should work:

 >>> iter = (i for i in range(50)) >>> sum(1 for _ in iter) 50 

Although it iterates over each element and counts them, this is the fastest way to do this.

+127
Jul 27 '10 at 16:35
source share

No, any method will require the resolution of each result. You can do

 iter_length = len(list(iterable)) 

but doing this on an infinite iterator will certainly never return. It will also consume an iterator, and it will have to reset if you want to use the contents.

Telling us what the real problem you are trying to solve can help us find the best way to reach your actual goal.

Edit: with list() entire iteration will be read into memory immediately, which may be undesirable. Another way is to do

 sum(1 for _ in iterable) 

like another person. This will avoid storing it in memory.

+39
Jul 27 '10 at 16:34
source share

You cannot (except that the type of a specific iterator implements some specific methods that make this possible).

As a rule, you can consider iterator elements only as iterator consumption. One of the most effective ways:

 import itertools from collections import deque def count_iter_items(iterable): """ Consume an iterable not reading it into memory; return the number of items. """ counter = itertools.count() deque(itertools.izip(iterable, counter), maxlen=0) # (consume at C speed) return next(counter) 

(for Python 3.x, replace itertools.izip with zip ).

+17
Feb 27 '13 at 12:22
source share

Kinda You can check the __length_hint__ method, but warn that (at least prior to Python 3.4, how gsnedders helps), an undocumented implementation detail (the next message in the thread ) that could best disappear or cause nasal daemons.

Otherwise, no. Iterators are just an object that only exposes the next() method. You can call it as many times as needed, and they may or may not ultimately raise StopIteration . Fortunately, in most cases this behavior is transparent to the encoder. :)

+15
Jul 27 '10 at 17:17
source share

An iterator is just an object that has a pointer to the next object to be read by some buffer or stream, it is like a LinkedList, where you do not know how many things you have until you go through them. Iterators need to be efficient because all they do is tell you what comes next in the links instead of using indexing (but as you saw, you lose the ability to see how many records come next).

+9
Jul 27 '10 at 16:47
source share

As for your initial question, the answer still is that there is generally no way to know the length of the iterator in Python.

Given that your question is motivated by the pysam library application, I can give a more specific answer: I am a contributor to PySAM, and the final answer is that SAM / BAM files do not provide the exact number of aligned reads. This information is also easily accessible from the BAM index file. The best thing you can do is to estimate the approximate number of alignments using the location of the file pointer after reading a series of alignments and extrapolating depending on the total file size. This is enough to implement a progress bar, but not a method of counting alignments in constant time.

+7
Aug 17 '10 at 18:57
source share

I like the cardinality package, it is very lightweight and tries to use the maximum possible implementation available depending on the iteration.

Using:

 >>> import cardinality >>> cardinality.count([1, 2, 3]) 3 >>> cardinality.count(i for i in range(500)) 500 >>> def gen(): ... yield 'hello' ... yield 'world' >>> cardinality.count(gen()) 2 

The actual implementation of count() as follows:

 def count(iterable): if hasattr(iterable, '__len__'): return len(iterable) d = collections.deque(enumerate(iterable, 1), maxlen=1) return d[0][0] if d else 0 
+6
Apr 15 '16 at 10:32
source share

There are two ways to get the length of "something" on a computer.

The first way is to store the counter - this requires something that affects the file / data to change it (or a class that provides only interfaces, but it comes down to the same thing).

Another way is to iterate over it and calculate how big it is.

+4
Jul 27 '10 at 16:55
source share

Quick test:

 import collections import itertools def count_iter_items(iterable): counter = itertools.count() collections.deque(itertools.izip(iterable, counter), maxlen=0) return next(counter) def count_lencheck(iterable): if hasattr(iterable, '__len__'): return len(iterable) d = collections.deque(enumerate(iterable, 1), maxlen=1) return d[0][0] if d else 0 def count_sum(iter): return sum(1 for _ in iter) iter = (x for x in xrange(100)) %timeit count_iter_items(iter) %timeit count_lencheck(iter) %timeit sum(iter) 

Results:

 1000000 loops, best of 3: 553 ns per loop 1000000 loops, best of 3: 730 ns per loop 1000000 loops, best of 3: 246 ns per loop 

those. simple count_sum is the way to go.

+2
Jun 12 '17 at 13:36 on
source share

It is common practice to place this type of information in the file header, and for pysam to access it. I do not know the format, but have you checked the API?

As others have said, you cannot know the length from the iterator.

0
Jul 27 '10 at 17:37
source share

This contradicts the very definition of an iterator, which is a pointer to an object, as well as information on how to move to the next object.

The iterator does not know how many times he will be able to iterate through to the end. It can be infinite, so infinity can be your answer.

0
Nov 08 '13 at 0:53
source share
 def count_iter(iter): sum = 0 for _ in iter: sum += 1 return sum 
0
Feb 24 '16 at 18:06
source share

Although it is generally impossible to do what was set, it is still often useful to count the number of iterations after repeating them. You can use jaraco.itertools.Counter or similar for this. Here is an example of using Python 3 and rwt to download a package.

 $ rwt -q jaraco.itertools -- -q >>> import jaraco.itertools >>> items = jaraco.itertools.Counter(range(100)) >>> _ = list(counted) >>> items.count 100 >>> import random >>> def gen(n): ... for i in range(n): ... if random.randint(0, 1) == 0: ... yield i ... >>> items = jaraco.itertools.Counter(gen(100)) >>> _ = list(counted) >>> items.count 48 
0
Aug 04 '17 at 20:05
source share



All Articles