The most efficient way to convert list items to int and sum them

Question

The most efficient way to convert list items to int and sum them

I am doing something like this to summarize several elements of a string:

for line in open(filename, 'r'):
   big_list = line.strip().split(delim)
   a = sum(int(float(item)) for item in big_list[start:end] if item)  
   # do some other stuff

this is done line by line with a large file, where some elements may be missing, i.e. equal. '' If I use the above expression to compute a, the script becomes much slower than without it. Is there any way to speed it up?

+4

python io file-io

Bob Aug 19 '14 at 16:46

source share

2 answers

Emilio m bumachar · Answer 1 · 2014-08-19T18:21:46+0000

As Padraic commented, use a filter to trim empty lines, then discard "if item":

>>> import timeit
>>> timeit.timeit("sum(int(float(item)) for item in ['','3.4','','','1.0'] if item)",number=10000)
0.04612559381553183
>>> timeit.timeit("sum(int(float(item)) for item in filter(None, ['','3.4','','','1.0']))",number=10000)
0.04827789913997549
>>> sum(int(float(item)) for item in filter(None, ['','3.4','','','1.0']))
4
>>>

Counterproductive in this example, but may decrease in your context. Measure to see.

see also this answer

nmclean · Answer 2 · 2014-08-19T18:40:11+0000

, , float . , :

import re

pattern = re.compile("\d+")

float :

sum(int(pattern.search(item).group(0)) for item in big_list[start:end] if item)

, " " big_list. , , "6.0,,1.2,3.0,". :

delim = ","
pattern = re.compile("(\d+)\.\d+|" + re.escape(delim) + re.escape(delim) + "|$")

: ['6', '', '1', '3', ''], , , float:

for line in open(filename, 'r'):
    big_list = pattern.findall(line)
    a = sum(int(item) for item in big_list[start:end] if item)

The most efficient way to convert list items to int and sum them

More articles: