Group and find top n value_counts pandas

Question

Group and find top n value_counts pandas

I have a taxi data dataframe with two columns that look like this:

Neighborhood Borough Time Midtown Manhattan X Melrose Bronx Y Grant City Staten Island Z Midtown Manhattan A Lincoln Square Manhattan B

Basically, each line represents a taxi in the area in the area. Now I want to find the 5 best neighborhoods in each area with the most pickups. I tried this:

 df['Neighborhood'].groupby(df['Borough']).value_counts()

Which gives me something like this:

 borough Bronx High Bridge 3424 Mott Haven 2515 Concourse Village 1443 Port Morris 1153 Melrose 492 North Riverdale 463 Eastchester 434 Concourse 395 Fordham 252 Wakefield 214 Kingsbridge 212 Mount Hope 200 Parkchester 191 ...... Staten Island Castleton Corners 4 Dongan Hills 4 Eltingville 4 Graniteville 4 Great Kills 4 Castleton 3 Woodrow 1

How can I filter it to get only the top 5? I know that there are several questions with a similar name, but they did not help me.

+7

python pandas dataframe

ytk Feb 12 '16 at 14:06

source share

2 answers

You can do this on one line by slightly expanding the original group with "nlargest":

 >>> df.groupby(['Borough', 'Neighborhood']).Neighborhood.value_counts().nlargest(5) Borough Neighborhood Neighborhood Bronx Melrose Melrose 1 Manhattan Midtown Midtown 1 Manhatten Lincoln Square Lincoln Square 1 Midtown Midtown 1 Staten Island Grant City Grant City 1 dtype: int64

+3

Alexander Feb 12 '16 at 16:56

source share

jezrael · Accepted Answer · 2016-02-12T14:18:06+0000

I think you can use nlargest - you can change 1 to 5 :

 s = df['Neighborhood'].groupby(df['Borough']).value_counts() print s Borough Bronx Melrose 7 Manhattan Midtown 12 Lincoln Square 2 Staten Island Grant City 11 dtype: int64 print s.groupby(level=[0,1]).nlargest(1) Bronx Bronx Melrose 7 Manhattan Manhattan Midtown 12 Staten Island Staten Island Grant City 11 dtype: int64

additional columns were created, the specified level information

Group and find top n value_counts pandas

More articles: