Python Pandas: assign last value to DataFrame group for all records of this group

Question

Python Pandas: assign last value to DataFrame group for all records of this group

In Python Pandas, I have a DataFrame. I am grouping this DataFrame by column and want to assign the last column value to all rows of another column.

I know that I can select the last row of the group with this command:

import pandas as pd df = pd.DataFrame({'a': (1,1,2,3,3), 'b':(20,21,30,40,41)}) print(df) print("-") result = df.groupby('a').nth(-1) print(result)

Result:

  ab 0 1 20 1 1 21 2 2 30 3 3 40 4 3 41 - b a 1 21 2 30 3 41

How could I assign the result of this operation to the original file frame so that I have something like:

  ab b_new 0 1 20 21 1 1 21 21 2 2 30 30 3 3 40 41 4 3 41 41

+7

python pandas pandas-groupby

user7450524 Dec 21 '17 at 11:51

source share

3 answers

Two possibilities: groupby + nth + map or replace

 df['b_new'] = df.a.map(df.groupby('a').b.nth(-1))

Or

 df['b_new'] = df.a.replace(df.groupby('a').b.nth(-1))

You can also replace nth(-1) with last() (this actually happens to make it a little faster), but nth gives you more flexibility as to which element to choose from each group in b .

 df ab b_new 0 1 20 21 1 1 21 21 2 2 30 30 3 3 40 41 4 3 41 41

+6

cᴏʟᴅsᴘᴇᴇᴅ Dec 21 '17 at 11:54

source share

I think it should be fast

 df.merge(df.drop_duplicates('a',keep='last'),on='a',how='left') Out[797]: a b_x b_y 0 1 20 21 1 1 21 21 2 2 30 30 3 3 40 41 4 3 41 41

+2

Wen Dec 21 '17 at 14:52

source share

jezrael · Accepted Answer · 2017-12-21T11:52:59+0000

Use transform with last :

 df['b_new'] = df.groupby('a')['b'].transform('last')

Alternative:

 df['b_new'] = df.groupby('a')['b'].transform(lambda x: x.iat[-1]) print(df) ab b_new 0 1 20 21 1 1 21 21 2 2 30 30 3 3 40 41 4 3 41 41

Solution with nth and join :

 df = df.join(df.groupby('a')['b'].nth(-1).rename('b_new'), 'a') print(df) ab b_new 0 1 20 21 1 1 21 21 2 2 30 30 3 3 40 41 4 3 41 41

Delay

 N = 10000 df = pd.DataFrame({'a':np.random.randint(1000,size=N), 'b':np.random.randint(10000,size=N)}) #print (df) def f(df): return df.join(df.groupby('a')['b'].nth(-1).rename('b_new'), 'a') #cᴏʟᴅsᴘᴇᴇᴅ1 In [211]: %timeit df['b_new'] = df.a.map(df.groupby('a').b.nth(-1)) 100 loops, best of 3: 3.57 ms per loop #cᴏʟᴅsᴘᴇᴇᴅ2 In [212]: %timeit df['b_new'] = df.a.replace(df.groupby('a').b.nth(-1)) 10 loops, best of 3: 71.3 ms per loop #jezrael1 In [213]: %timeit df['b_new'] = df.groupby('a')['b'].transform('last') 1000 loops, best of 3: 1.82 ms per loop #jezrael2 In [214]: %timeit df['b_new'] = df.groupby('a')['b'].transform(lambda x: x.iat[-1]) 10 loops, best of 3: 178 ms per loop #jezrael3 In [219]: %timeit f(df) 100 loops, best of 3: 3.63 ms per loop

Caveat

The results do not take into account performance, given the number of groups, which will greatly affect the timings for some of these solutions.

Python Pandas: assign last value to DataFrame group for all records of this group

More articles: