Pandas filter if the name is displayed in the column more than n times

this is my dataframe

df = pd.DataFrame({'Col1':['Joe','Bob','Joe','Joe'],
                  'Col2':[55,25,88,80]})

I only need names if they appear more than once in 'Col1'

I can do it like this:

grouped = df.groupby("Col1")
grouped.filter(lambda x: x["Col1"].count()>2)['Col1'].unique()

However this is ugly code

Is there an easier way to clean it?

+4
source share
2 answers

Use value_countsandisin

vc = df.Col1.value_counts() > 2
vc = vc[vc]

df.loc[df.Col1.isin(vc.index)]

enter image description here

+3
source

Here's a NumPy based solution using np.unique-

unq,count = np.unique(df.Col1,return_counts=True)
out = unq[count>n]

Run Example -

In [34]: df
Out[34]: 
  Col1  Col2
0  Joe    55
1  Bob    25
2  Joe    88
3  Joe    80
4  Joe    44
5  Bob    24
6  Joe    45

In [35]: unq,count = np.unique(df.Col1,return_counts=True)

In [36]: unq[count>3] # n = 3
Out[36]: array(['Joe'], dtype=object)
+3
source

All Articles