When to use a category, not an object?

I have a CSV dataset with 40 functions with which I process Pandas. 7 are continuous ( int32), and the rest of them are categorical.

My question is:

Should I use dtype('category')Pandas for categorical functions, or can I specify a default value dtype('object')?

+4
source share
1 answer

Use a category when there are many repetitions that you expect to use.

For example, suppose I want the total size for an exchange for a large trading table. Using the standard objectis quite reasonable:

In [6]: %timeit trades.groupby('exch')['size'].sum()
1000 loops, best of 3: 1.25 ms per loop

, , , category:

In [7]: trades['exch'] = trades['exch'].astype('category')

In [8]: %timeit trades.groupby('exch')['size'].sum()
1000 loops, best of 3: 702 µs per loop

, - . , .

+2

All Articles