I am trying to count unique values ββin a numpy array.
import numpy as np from collections import defaultdict import scipy.stats import time x = np.tile([1,2,3,4,5,6,7,8,9,10],20000) for i in [44,22,300,403,777,1009,800]: x[i] = 11 def getCounts(x): counts = defaultdict(int) for item in x: counts[item] += 1 return counts flist = [getCounts, scipy.stats.itemfreq] for f in flist: print f t1 = time.time() y = f(x) t2 = time.time() print y print '%.5f sec' % (t2-t1)
I could not find the built-in function first, so I wrote getCounts() ; then I found scipy.stats.itemfreq , so I thought that I would use this instead. But it is slow! This is what I get on my PC. Why is it so slow compared to such a simple handwritten function?
<function getCounts at 0x0000000013C78438> defaultdict(<type 'int'>, {1: 19998, 2: 20000, 3: 19999, 4: 19999, 5: 19999, 6: 20000, 7: 20000, 8: 19999, 9: 20000, 10: 19999, 11: 7}) 0.04700 sec <function itemfreq at 0x0000000013C5D208> [[ 1.00000000e+00 1.99980000e+04] [ 2.00000000e+00 2.00000000e+04] [ 3.00000000e+00 1.99990000e+04] [ 4.00000000e+00 1.99990000e+04] [ 5.00000000e+00 1.99990000e+04] [ 6.00000000e+00 2.00000000e+04] [ 7.00000000e+00 2.00000000e+04] [ 8.00000000e+00 1.99990000e+04] [ 9.00000000e+00 2.00000000e+04] [ 1.00000000e+01 1.99990000e+04] [ 1.10000000e+01 7.00000000e+00]] 2.04100 sec
Jason s
source share