PyPy: Serious performance hit when using None in an integer list

Since the algorithm I want to implement uses indexes 1..n and because it is very error-prone to shift each index by one, I decided to get smart and inserted a dummy element at the beginning of each list, so I can use the original formula from the article .

For brevity, consider this toy example:

 def calc(N): nums=[0]+range(1,N+1) return sum(nums[1:]) #skip first element 

However, I was worried that my results were false, because I accidentally could access the 0th element and not know about it. Therefore, I became even smarter and used None instead of 0 as the first element - every arithmetic operation with it will lead to a runtime error:

 def calc_safe(N): nums=[None]+range(1,N+1) #here we use "None" return sum(nums[1:]) 

Surprisingly, this small change resulted in a huge performance hit for pypy (even with the current version 5.8) - the code became about 10 times slower! Here is just on my car:

  pypy-5.8 cpython calc(10**8) 0.5 sec 5.5 sec calc_safe(10**8) 7.5 sec 5.5 sec 

As a side to node: Cpython doesn't care if None or not.

So my question is twofold:

  • Obviously using None not a good idea, but why?
  • Is it possible to ensure the security of the None approach and maintain performance?

Edit: As Armin explained, not all lists are equal, and we can see which strategy is used through:

 import __pypy__ print __pypy__.strategy(nums) 

In the first case, it is IntegerListStrategy and in the second ObjectListStrategy . The same thing happens if we use a large integer value (e.g. 2**100 ) instead of None .

+7
performance python pypy
source share
1 answer

PyPy has a special register for lists containing only integers - it stores them as array.array . If it does not have None, then this optimization no longer works.

Perhaps this could be fixed inside PyPy to allow None as a special case ...

+4
source share

All Articles