With numpy, you can write something yourself or use the groupby functions (rec_groupby function from matplotlib.mlab, but this is much slower). For more powerful groupby functions, maybe see pandas ), and I compared it with Michael Dunn's answer with a dictionary:
import numpy as np import random from matplotlib.mlab import rec_groupby listA = [random.choice("abcdef") for i in range(20000)] listB = [20 * random.random() for i in range(20000)] names = np.array(listA) values = np.array(listB) def f_dict(listA, listB): d = {} for a, b in zip(listA, listB): d.setdefault(a, []).append(b) avg = [] for key in d: avg.append(sum(d[key])/len(d[key])) return d.keys(), avg def f_numpy(names, values): result_names = np.unique(names) result_values = np.empty(result_names.shape) for i, name in enumerate(result_names): result_values[i] = np.mean(values[names == name]) return result_names, result_values
This is the result for three:
In [2]: f_dict(listA, listB) Out[2]: (['a', 'c', 'b', 'e', 'd', 'f'], [9.9003182717213765, 10.077784850173568, 9.8623915728699636, 9.9790599744319319, 9.8811096512807097, 10.118695410115953]) In [3]: f_numpy(names, values) Out[3]: (array(['a', 'b', 'c', 'd', 'e', 'f'], dtype='|S1'), array([ 9.90031827, 9.86239157, 10.07778485, 9.88110965, 9.97905997, 10.11869541])) In [7]: rec_groupby(struct_array, ('names',), (('values', np.mean, 'resvalues'),)) Out[7]: rec.array([('a', 9.900318271721376), ('b', 9.862391572869964), ('c', 10.077784850173568), ('d', 9.88110965128071), ('e', 9.979059974431932), ('f', 10.118695410115953)], dtype=[('names', '|S1'), ('resvalues', '<f8')])
And it seems that numpy is slightly faster for this test (and the given groupby function is much slower):
In [32]: %timeit f_dict(listA, listB) 10 loops, best of 3: 23 ms per loop In [33]: %timeit f_numpy(names, values) 100 loops, best of 3: 9.78 ms per loop In [8]: %timeit rec_groupby(struct_array, ('names',), (('values', np.mean, 'values'),)) 1 loops, best of 3: 203 ms per loop