How is a Pandas crosstab different from a Pandas pivot_table?

Both pandas.crosstab and the Pandas pivot table seem to provide exactly the same functionality. Are there any differences?

+15
source share
3 answers

The main difference between the two is that pivot_table expects your input to already be in the form of a DataFrame; you pass the DataFrame to pivot_table and specify index / columns / values , passing the column names as strings. With cross_tab you don’t have to include a DataFrame, since you are simply passing arrays-like objects to index / columns / values .

Looking at the source code for crosstab , it essentially accepts the objects you send that look like arrays, creates a DataFrame, and then calls pivot_table accordingly.

In general, use pivot_table if you already have a DataFrame so that you do not have the additional cost of creating the same DataFrame again. If you start with array-like objects and are only interested in turning data, use crosstab . In most cases, I don’t think it will really matter which function you decide to use.

+20
source

The same, if in pivot_table use aggfunc=len and fill_value=0 :

 pd.crosstab(df['Col X'], df['Col Y']) pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0) 

EDIT: There is a big difference:

By default, aggfunc different: pivot_table - np.mean , crosstab - len .

The margins_name parameter is located only in pivot_table .

In pivot_table you can use Grouper for index and columns keywords.


I think if you just need a frequency table, the crosstab function crosstab better.

+12
source

Unfortunately, pivot_table does not have a normalize argument.

In crosstab the normalize argument calculates percentages by dividing each cell by the sum of the cells, as described below:

  • normalize = 'index' divides each cell by the sum of its row
  • normalize = 'columns' divides each cell by the sum of the column
  • normalize = True divides each cell by the sum of all cells in the table
+1
source

All Articles