Itertools.imap versus map throughout iterable

I'm interested in learning from http://docs.python.org/2/library/itertools.html#itertools.imap , namely:

sum(imap(operator.mul, vector1, vector2)) 

as an effective point product. I understand that imap gives a generator instead of a list, and although I understand how faster / consume less memory, if you are considering only the first few elements, with the surrounding sum (), I don’t see how it behaves differently than:

 sum(map(operator.mul, vector1, vector2)) 
+8
python itertools
source share
4 answers

The difference between map and imap becomes clear when you start increasing the size of what you are repeating:

 # xrange object, takes up no memory data = xrange(1000000000) # Tries to builds a list of 1 billion elements! # Therefore, fails with MemoryError on 32-bit systems. doubled = map(lambda x: x * 2, data) # Generator object that lazily doubles each item as it iterated over. # Takes up very little (and constant, independent of data size) memory. iter_doubled = itertools.imap(lambda x: x * 2, data) # This is where the iteration and the doubling happen. # Again, since no list is created, this doesn't run you out of memory. sum(iter_doubled) # (The result is 999999999000000000L, if you're interested. # It takes a minute or two to compute, but consumes minimal memory.) 

Note that in Python 3, the built-in map behaves like Python 2 itertools.imap (which was removed because it is no longer needed). To get the “old map behavior”, you should use list(map(...)) , which is another good way to visualize how Python 2 itertools.imap and map differ from each other.

+14
source share

The first line will calculate the sum accumulating the elements one at a time. The second will first calculate the entire point product, and after that, having the whole result in memory, he will go on to calculate the sum. Thus, there is an increase in the complexity of memory.

+7
source share

Another thing to note is that “uses a lot less memory” often means “faster.” The lazy (iterator) version consumes each product immediately after calculating it, adding it to the current amount. The product and current amount are almost certainly in the L1 cache. If you first calculate all the products, then depending on the number of elements that will be sure that the first calculated products will be knocked out of the L1 cache, and then from the L2 cache and ... so that at the second pass, finally add them together , all products have a low memory hierarchy (and, in extreme cases, should be read from the page file).

But I don’t understand what you mean by "don’t see how he behaves differently than". The final calculated result is the same anyway.

+2
source share

The difference is that the entire output of imap(...) or map(...) is passed to sum() . You write that imap returns a generator, but I think you might have the impression that sum(map(...)) has several shortcuts that do the same thing. This is not true. map() build a complete list of results before anything is passed to sum() .

+2
source share

All Articles