May be faster on lists

This is somehow a continuation of this question.

So, you will notice that you cannot execute sum in the list of strings to concatenate them, python tells you to use str.join instead, and this is good advice, because no matter how you use + on strings, performance is bad.

The "cannot use sum " constraint does not apply to list , and although itertools.chain.from_iterable is the preferred way to do this list smoothing.

But sum(x,[]) , when x is a list of lists, is ultimately bad.

But should it remain so?

I compared 3 approaches

 import time import itertools a = [list(range(1,1000)) for _ in range(1000)] start=time.time() sum(a,[]) print(time.time()-start) start=time.time() list(itertools.chain.from_iterable(a)) print(time.time()-start) start=time.time() z=[] for s in a: z += s print(time.time()-start) 

results:

  • sum() in the list of lists: 10.46647310256958. Ok, we knew.
  • itertools.chain : 0.07705187797546387
  • user accumulated amount using the add-on in place: 0.057044029235839844 (maybe itertools.chain can be faster, as you can see)

So, sum lagging behind because it does result = result + b instead of result += b

So now my question is:

Why can't sum use this cumulative approach if one is available?

(This would be transparent to existing applications and would allow efficient use of the built-in sum module to smooth lists)

+5
performance python list sum
source share
3 answers

We could try to make sum () smarter, but Alex Martelli and Guido van Rossum wanted to focus on arithmetic summations.

FWIW, you should get reasonable performance with this simple code:

 result = [] for seq in mylists: result += seq 

For your other question, โ€œwhy can't this cumulative approach be used when it's available?โ€, See this comment for builtin_sum () in Python / bltinmodule.c:

  /* It tempting to use PyNumber_InPlaceAdd instead of PyNumber_Add here, to avoid quadratic running time when doing 'sum(list_of_lists, [])'. However, this would produce a change in behaviour: a snippet like empty = [] sum([[x] for x in range(10)], empty) would change the value of empty. */ 
+4
source share
 /* It tempting to use PyNumber_InPlaceAdd instead of PyNumber_Add here, to avoid quadratic running time when doing 'sum(list_of_lists, [])'. However, this would produce a change in behaviour: a snippet like empty = [] sum([[x] for x in range(10)], empty) would change the value of empty. */ temp = PyNumber_Add(result, item); 

Taken from the embedded Python source code https://github.com/python/cpython/blob/master/Python/bltinmodule.c#L2146 Line: 2342

+1
source share

FWIW, we can trick the interpreter so that we can use sum for strings by passing the corresponding instance of the special class as start arg to sum .

 class Q(object): def __init__(self, data=''): self.data = str(data) def __str__(self): return self.data def __add__(self, other): return Q(self.data + str(other)) print(sum(['abc', 'def', 'ghi'], Q())) 

Exit

 abcdefghi 

Of course this is pretty stupid. :)

+1
source share

All Articles