Pandas DataFrame Group by two columns and counting

I have a pandas dataframe in the following format:

df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T df.columns = ['col1','col2','col3','col4','col5'] 

DF:

  col1 col2 col3 col4 col5 0 1.1 A 1.1 x/y/z 1 1 1.1 A 1.7 x/y 3 2 1.1 A 2.5 x/y/z/n 3 3 2.6 B 2.6 x/u 2 4 2.5 B 3.3 x 4 5 3.4 B 3.8 x/u/v 2 6 2.6 B 4 x/y/z 5 7 2.6 A 4.2 x 3 8 3.4 B 4.3 x/u/v/b 6 9 3.4 C 4.5 - 3 10 2.6 B 4.6 x/y 5 11 1.1 D 4.7 x/y/z 1 12 1.1 D 4.7 x 1 13 3.3 D 4.8 x/u/v/w 1 

Now I want to group this into two columns as follows:

 df.groupby(['col5','col2']).reset_index() 

Output:

  index col1 col2 col3 col4 col5 col5 col2 1 A 0 0 1.1 A 1.1 x/y/z 1 D 0 11 1.1 D 4.7 x/y/z 1 1 12 1.1 D 4.7 x 1 2 13 3.3 D 4.8 x/u/v/w 1 2 B 0 3 2.6 B 2.6 x/u 2 1 5 3.4 B 3.8 x/u/v 2 3 A 0 1 1.1 A 1.7 x/y 3 1 2 1.1 A 2.5 x/y/z/n 3 2 7 2.6 A 4.2 x 3 C 0 9 3.4 C 4.5 - 3 4 B 0 4 2.5 B 3.3 x 4 5 B 0 6 2.6 B 4 x/y/z 5 1 10 2.6 B 4.6 x/y 5 6 B 0 8 3.4 B 4.3 x/u/v/b 6 

I want to get an account for each row, as shown below. Expected Result:

 col5 col2 count 1 A 1 D 3 2 B 2 etc... 

How to get the expected result? And I want to find the largest number for each value of "col2"?

+103
python pandas dataframe
Jul 16 '13 at 14:19
source share
6 answers

In response to @Andy, you can do the following to solve your second question:

 In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max() Out[56]: 0 col2 A 3 B 2 C 1 D 3 
+70
Jul 16 '13 at 14:53
source share

Are you looking for size :

 In [11]: df.groupby(['col5', 'col2']).size() Out[11]: col5 col2 1 A 1 D 3 2 B 2 3 A 3 C 1 4 B 1 5 B 2 6 B 1 dtype: int64 



To get the same answer as waitkuo (the "second question"), but a little cleaner, you need to group the level:

 In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max() Out[12]: col2 A 3 B 2 C 1 D 3 dtype: int64 
+97
Jul 16 '13 at 14:37
source share

Insert data into a Pandas data frame and provide a column name .

 import pandas as pd df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T df.columns = [['Alphabet','Words']] print(df) #printing dataframe. 

These are our printed data:

enter image description here

To create a data group in pandas and counters ,
You need to provide another column that counts the grouping, let this column be called as β€œCOUNTER” in the data frame .

Like this:

 df['COUNTER'] =1 #initially, set that counter to 1. group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function print(group_data) 

EXIT:

enter image description here

+17
Jul 21 '16 at 11:53
source share

An idiomatic solution that uses only one group

 (df.groupby(['col5', 'col2']).size() .sort_values(ascending=False) .reset_index(name='count') .drop_duplicates(subset='col2')) col5 col2 count 0 3 A 3 1 1 D 3 2 5 B 2 6 3 C 1 

explanation

The result of the grouped size method is a series with col5 and col2 in the index. From here you can use another groupby method to find the maximum value of each value in col2 but this is not necessary. You can simply sort all the values ​​in descending order and then save only the rows with the first appearance of col2 with the drop_duplicates method.

+8
Nov 05 '17 at 19:37
source share

If you want to add a new column (say, "count_column") containing the number of group counts in the data framework:

 df.count_column=df.groupby(['col5','col2']).col5.transform('count') 

(I chose "col5" because it does not contain nan)

+1
Jun 06 '17 at 12:20
source share

You can simply use the built-in function counter by following the groupby function

 df.groupby(['col5','col2']).count() 
-3
Dec 02 '16 at 2:22
source share



All Articles