Pandas value_counts function explanation

Can someone explain what the line is doing

result = data.apply(pd.value_counts).fillna(0)  

here?

import pandas as pd 
from pandas import Series, DataFrame

data = DataFrame({'Qu1': [1, 3, 4, 3, 4],
                  'Qu2': [2, 3, 1, 2, 3],
                  'Qu3': [1, 5, 2, 4, 4]})

result = data.apply(pd.value_counts).fillna(0)  

In [26]:data
Out[26]:
Qu1 Qu2 Qu3
0 1 2 1
1 3 3 5
2 4 1 2
3 3 2 4
4 4 3 4

In [27]:result
Out[28]:
Qu1 Qu2 Qu3
1 1 1 1
2 0 2 1
3 2 2 0
4 2 0 2
5 0 0 1
+4
source share
2 answers

A histogram of non-zero values ​​is created in the documents. If we look at the column Qu1of result, we can say that in the original column data.Qu1there is one, zero 2, two 3, two 4 and zero 5.

+2
source

I think the easiest way to understand what is happening is to break it.

Each value_counts column simply counts the number of occurrences of each value in the Series (i.e., 4 appears twice in the Qu1 column):

In [11]: pd.value_counts(data.Qu1)
Out[11]:
4    2
3    2
1    1
dtype: int64

, 1 5 , range(1, 6):

In [12]: pd.value_counts(data.Qu1).reindex(range(1, 6))
Out[12]:
1     1
2   NaN
3     2
4     2
5   NaN
dtype: float64

, 0, NaN, fillna:

In [13]: pd.value_counts(data.Qu1).reindex(range(1, 6)).fillna(0)
Out[13]:
1    1
2    0
3    2
4    2
5    0
dtype: float64

, :

In [14]: pd.concat((pd.value_counts(data[col]).reindex(range(1, 6)).fillna(0)
                       for col in data.columns),
                   axis=1, keys=data.columns)
Out[14]:
   Qu1  Qu2  Qu3
1    1    1    1
2    0    2    1
3    2    2    0
4    2    0    2
5    0    0    1
+7

All Articles