Pandas dense rank

I am dealing with a pandas dataframe and have a frame like this:

Year Value 2012 10 2013 20 2013 25 2014 30 

I want to make the DENSE_RANK () function equivalent (in a year). to make an extra column as follows:

  Year Value Rank 2012 10 1 2013 20 2 2013 25 2 2014 30 3 

How to do it in pandas?

Thanks!

+5
source share
3 answers

Use pd.Series.rank with method='dense'

 df['Rank'] = df.Year.rank(method='dense').astype(int) df 

enter image description here

+6
source

You can convert the year into categorical ones, and then take your codes (adding one of them, since they are zero indexed, and you want the initial value to start with one in your example).

 df['Rank'] = df.Year.astype('category').cat.codes + 1 >>> df Year Value Rank 0 2012 10 1 1 2013 20 2 2 2013 25 2 3 2014 30 3 
+4
source

The fastest factorize solution:

 df['Rank'] = pd.factorize(df.Year)[0] + 1 

Delay

 #len(df)=40k df = pd.concat([df]*10000).reset_index(drop=True) In [13]: %timeit df['Rank'] = df.Year.rank(method='dense').astype(int) 1000 loops, best of 3: 1.55 ms per loop In [14]: %timeit df['Rank1'] = df.Year.astype('category').cat.codes + 1 1000 loops, best of 3: 1.22 ms per loop In [15]: %timeit df['Rank2'] = pd.factorize(df.Year)[0] + 1 1000 loops, best of 3: 737 ยตs per loop 
+4
source

All Articles