Pandas DataFrame Group by two columns and counting

Question

Pandas DataFrame Group by two columns and counting

I have a pandas dataframe in the following format:

df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T df.columns = ['col1','col2','col3','col4','col5']

DF:

  col1 col2 col3 col4 col5 0 1.1 A 1.1 x/y/z 1 1 1.1 A 1.7 x/y 3 2 1.1 A 2.5 x/y/z/n 3 3 2.6 B 2.6 x/u 2 4 2.5 B 3.3 x 4 5 3.4 B 3.8 x/u/v 2 6 2.6 B 4 x/y/z 5 7 2.6 A 4.2 x 3 8 3.4 B 4.3 x/u/v/b 6 9 3.4 C 4.5 - 3 10 2.6 B 4.6 x/y 5 11 1.1 D 4.7 x/y/z 1 12 1.1 D 4.7 x 1 13 3.3 D 4.8 x/u/v/w 1

Now I want to group this into two columns as follows:

 df.groupby(['col5','col2']).reset_index()

Output:

  index col1 col2 col3 col4 col5 col5 col2 1 A 0 0 1.1 A 1.1 x/y/z 1 D 0 11 1.1 D 4.7 x/y/z 1 1 12 1.1 D 4.7 x 1 2 13 3.3 D 4.8 x/u/v/w 1 2 B 0 3 2.6 B 2.6 x/u 2 1 5 3.4 B 3.8 x/u/v 2 3 A 0 1 1.1 A 1.7 x/y 3 1 2 1.1 A 2.5 x/y/z/n 3 2 7 2.6 A 4.2 x 3 C 0 9 3.4 C 4.5 - 3 4 B 0 4 2.5 B 3.3 x 4 5 B 0 6 2.6 B 4 x/y/z 5 1 10 2.6 B 4.6 x/y 5 6 B 0 8 3.4 B 4.3 x/u/v/b 6

I want to get an account for each row, as shown below. Expected Result:

 col5 col2 count 1 A 1 D 3 2 B 2 etc...

How to get the expected result? And I want to find the largest number for each value of "col2"?

+103

python pandas dataframe

Nilani Algiriyage Jul 16 '13 at 14:19

source share

6 answers

Are you looking for size :

 In [11]: df.groupby(['col5', 'col2']).size() Out[11]: col5 col2 1 A 1 D 3 2 B 2 3 A 3 C 1 4 B 1 5 B 2 6 B 1 dtype: int64

To get the same answer as waitkuo (the "second question"), but a little cleaner, you need to group the level:

 In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max() Out[12]: col2 A 3 B 2 C 1 D 3 dtype: int64

+97

Andy Hayden Jul 16 '13 at 14:37

source share

Insert data into a Pandas data frame and provide a column name .

 import pandas as pd df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T df.columns = [['Alphabet','Words']] print(df) #printing dataframe.

These are our printed data:

To create a data group in pandas and counters ,
You need to provide another column that counts the grouping, let this column be called as “COUNTER” in the data frame .

Like this:

 df['COUNTER'] =1 #initially, set that counter to 1. group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function print(group_data)

EXIT:

+17

The Gr8 Adakron Jul 21 '16 at 11:53

source share

An idiomatic solution that uses only one group

 (df.groupby(['col5', 'col2']).size() .sort_values(ascending=False) .reset_index(name='count') .drop_duplicates(subset='col2')) col5 col2 count 0 3 A 3 1 1 D 3 2 5 B 2 6 3 C 1

explanation

The result of the grouped size method is a series with col5 and col2 in the index. From here you can use another groupby method to find the maximum value of each value in col2 but this is not necessary. You can simply sort all the values in descending order and then save only the rows with the first appearance of col2 with the drop_duplicates method.

+8

Ted Petrou Nov 05 '17 at 19:37

source share

If you want to add a new column (say, "count_column") containing the number of group counts in the data framework:

 df.count_column=df.groupby(['col5','col2']).col5.transform('count')

(I chose "col5" because it does not contain nan)

+1

Tom Jun 06 '17 at 12:20

source share

You can simply use the built-in function counter by following the groupby function

 df.groupby(['col5','col2']).count()

-3

seansio1995 Dec 02 '16 at 2:22

source share

waitingkuo · Accepted Answer · 2013-07-16 14:53

In response to @Andy, you can do the following to solve your second question:

 In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max() Out[56]: 0 col2 A 3 B 2 C 1 D 3

Pandas DataFrame Group by two columns and counting

An idiomatic solution that uses only one group

More articles: