@metatoaster has already pointed this out. Go to Counter . It is fast.
import pandas as pd from collections import Counter import timeit import numpy as np df = pd.DataFrame(np.random.randint(1, 10000, (100, 2)), columns=["NumA", "NumB"])
Timers
%timeit -n 10000 df['NumA'].value_counts() # 10000 loops, best of 3: 715 Β΅s per loop %timeit -n 10000 df['NumA'].value_counts().to_dict() # 10000 loops, best of 3: 796 Β΅s per loop %timeit -n 10000 Counter(df['NumA']) # 10000 loops, best of 3: 74 Β΅s per loop %timeit -n 10000 df.groupby(['NumA']).count() # 10000 loops, best of 3: 1.29 ms per loop
Hurrah!
dragonfire_007 Dec 25 '18 at 8:34 2018-12-25 08:34
source share