How is a Pandas crosstab different from a Pandas pivot_table?

Question

How is a Pandas crosstab different from a Pandas pivot_table?

Both pandas.crosstab and the Pandas pivot table seem to provide exactly the same functionality. Are there any differences?

+15

numpy scipy pandas pivot-table crosstab

user1008537 Mar 28 '16 at 17:44

source share

3 answers

root · Answer 1 · 2016-03-28T18:15:02+0000

The main difference between the two is that pivot_table expects your input to already be in the form of a DataFrame; you pass the DataFrame to pivot_table and specify index / columns / values , passing the column names as strings. With cross_tab you don’t have to include a DataFrame, since you are simply passing arrays-like objects to index / columns / values .

Looking at the source code for crosstab , it essentially accepts the objects you send that look like arrays, creates a DataFrame, and then calls pivot_table accordingly.

In general, use pivot_table if you already have a DataFrame so that you do not have the additional cost of creating the same DataFrame again. If you start with array-like objects and are only interested in turning data, use crosstab . In most cases, I don’t think it will really matter which function you decide to use.

jezrael · Answer 2 · 2016-03-28T17:46:23+0000

The same, if in pivot_table use aggfunc=len and fill_value=0 :

 pd.crosstab(df['Col X'], df['Col Y']) pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)

EDIT: There is a big difference:

By default, aggfunc different: pivot_table - np.mean , crosstab - len .

The margins_name parameter is located only in pivot_table .

In pivot_table you can use Grouper for index and columns keywords.

I think if you just need a frequency table, the crosstab function crosstab better.

yzerman · Answer 3 · 2019-08-29T12:48:59+0000

Unfortunately, pivot_table does not have a normalize argument.

In crosstab the normalize argument calculates percentages by dividing each cell by the sum of the cells, as described below:

normalize = 'index' divides each cell by the sum of its row
normalize = 'columns' divides each cell by the sum of the column
normalize = True divides each cell by the sum of all cells in the table

How is a Pandas crosstab different from a Pandas pivot_table?

More articles: