Generator speed in python 3

I am looking at a link about generators that someone posted. He first compares the two functions below. On his installation, he showed a 5% increase in speed with a generator.

I am running Windows XP, python 3.1.1 and do not seem to duplicate the results. I keep showing the “old way” (logs1) as a bit faster when testing with the provided logs and up to 1 GB of duplicate data.

Can someone help me understand what is happening differently?

Thanks!

def logs1(): wwwlog = open("big-access-log") total = 0 for line in wwwlog: bytestr = line.rsplit(None,1)[1] if bytestr != '-': total += int(bytestr) return total def logs2(): wwwlog = open("big-access-log") bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog) getbytes = (int(x) for x in bytecolumn if x != '-') return sum(getbytes) 

* edit, the interval is messed up in copy / paste

+7
python
source share
3 answers

For what it's worth, the main purpose of speed comparison in the presentation was to indicate that using generators does not result in huge performance overheads. Many programmers, when they first see generators, may begin to wonder about hidden costs. For example, are there all kinds of bizarre magic behind the scenes? Does this function use to make my program run twice as slow?

In general, this is not so. This example is intended to show that the solution of the generator can be executed essentially at the same speed, if not several times faster (although this depends on the situation, version of Python, etc.). If you observe huge performance differences between the two versions, then this will be something worth exploring.

+8
source share

In the David Beazley slides you contacted, he claims that all tests were performed using "Python 2.5.1 on OS X 10.4.11" and you say that you are running tests with Python 3.1 on Windows XP. So understand, you make some apples to compare oranges. I suspect that the Python version is important for the two variables.

Python 3 is a different beast than Python 2. Many things have changed under the hood (even inside the Python 2 branch). This includes performance optimization as well as performance regression (see, for example, a recent Beazley I / O blog post on Python 3 ). For this reason, the Python Performance Tips page explicitly states

You should always check these tips with your application and the version of Python you intend to use, and not just blindly accept that one method is faster than another.

I should mention that one area that you can count on for help from generators is to reduce memory consumption, not CPU consumption. If you have a large amount of data, where you calculate or extract something from each individual part, and you do not need data after, the generators will shine. See details for understanding the generator .

+1
source share

You have no answer in almost half an hour. I am posting something that makes sense to me, not necessarily the right answer. I believe that in half an hour it is better than nothing:

The first algorithm uses a generator. The generator loads the first page of results from the list (into memory) and constantly loads sequential pages (into memory) until there is nothing left of the input.

The second algorithm uses two generators, each of which has an if for only two comparisons for each cycle, unlike the first algorithm.

Also, the second algorithm calls the sum function at the end, rather than the first algorithm, which simply continues to add the corresponding integers as it continues to run into them.

Thus, for sufficiently large inputs, the second algorithm has more comparisons and an additional function call than the first. This might explain why it takes more time to complete than the first algorithm.

Hope this helps

0
source share

All Articles