Pandas groupby for null values

I have data similar to this in the csv file

Symbol,Action,Year AAPL,Buy,2001 AAPL,Buy,2001 BAC,Sell,2002 BAC,Sell,2002 

I can read it and group it this way

 df.groupby(['Symbol','Year']).count() 

I get

  Action Symbol Year AAPL 2001 2 BAC 2002 2 

I wish it (order doesn't matter)

  Action Symbol Year AAPL 2001 2 AAPL 2002 0 BAC 2001 0 BAC 2002 2 

I want to know if null events can be counted

+7
pandas
source share
4 answers

You can use pivot_table with unstack :

 print df.pivot_table(index='Symbol', columns='Year', values='Action', fill_value=0, aggfunc='count').unstack() Year Symbol 2001 AAPL 2 BAC 0 2002 AAPL 0 BAC 2 dtype: int64 

If you need output as a DataFrame , use to_frame :

 print df.pivot_table(index='Symbol', columns='Year', values='Action', fill_value=0, aggfunc='count').unstack() .to_frame() .rename(columns={0:'Action'}) Action Year Symbol 2001 AAPL 2 BAC 0 2002 AAPL 0 BAC 2 
+8
source share

If you want to do this without using pivot_table, you can try the following approach:

 midx = pd.MultiIndex.from_product([ df['Symbol'].unique(), df['Year'].unique()], names=['Symbol', 'Year']) df_grouped_by = df_grouped_by.reindex(midx, fill_value=0) 

What we basically do above is to create a multi-index of all possible values ​​that multiply two columns, and then use this multi-index to fill in the zeros in our group - using a dataframe.

0
source share

Step 1: Create a data frame that stores the count of each nonzero class in the counts column

 count_df = df.groupby(['Symbol','Year']).size().reset_index(name='counts') 

Step 2: Now use pivot_table to get the desired data frame with the calculation for both existing and non-existent classes.

 df_final = pd.pivot_table(count_df, index=['Symbol','Year'], values='counts', fill_value = 0, dropna=False, aggfunc=np.sum) 

Now counter values ​​can be retrieved as a list using the command

 list(df_final['counts']) 
0
source share

You can use this:

 df = df.groupby(['Symbol','Year']).count().unstack(fill_value=0).stack() print df 

Output:

  Action Symbol Year AAPL 2001 2 2002 0 BAC 2001 0 2002 2 
-one
source share

All Articles