This is not an answer, but just a collection of organized experiments.
It is really exciting. There seems to be a very dubious issue with Python memory allocation.
Here is my attempt to reduce the test file:
def sieve(k): s = [True] * k for i in xrange(3, int(sqrt(k)) + 2, 2): for j in range(i ** 2, k, i * 2): s[j] = False return [ i for i in range(3, k, 2) if s[i] ] st = time() for x in range(1000): rt = time() sieve(10000000) print "%3d %.2f %.2f" % (x, time() - rt, (time() - st) / (x + 1))
Note that if I remove if s[i] , make the inner range a xrange , make the return value a a generator, or pass in the inner for loop (or make it s[j] = True ), the behavior disappears, and times are flat.
Python memory usage is growing steadily as the function starts, and eventually reaches a plateau (at this point, the start time also starts on a plateau, approximately 250% of their initial values).
My hypothesis is that a large number of internal range (decreasing size) plus the resulting array causes some fragmentation of the worst-case heap, which makes it very difficult to continue selecting objects.
My recommendation would be to make a smaller test case and write it down as a bug with the Python developers (bugs.python.org).
nneonneo
source share