Pandas: assign groupby result in dataframe to new column

I have the following game data block (the real one has 500k rows):

df = pd.DataFrame({'size': list('SSMMMLS'),
                   'weight': [8, 10, 11, 1, 20, 14, 12],
                   'adult' : [False] * 5 + [True] * 2})

   adult size  weight
0  False    S       8
1  False    S      10
2  False    M      11
3  False    M       1
4  False    M      20
5   True    L      14
6   True    S      12

And you want to group by adult, select the row for which it weightis maximum, and assign the column size2value in the new column size:

   adult size size2  weight
0  False    S     S       8
1  False    S     S      10
2  False    M     S      11
3  False    M     S       1
4  False    M     S      20
5   True    L     L      14
6   True    S     L      12

I found this one but for me it doesn't work

So far, I:

df.loc[:, 'size2'] = df.groupby('adult',as_index=True)['weight','size']
                       .transform(lambda x: x.ix[x['weight'].idxmax()]['size'])
+4
source share
3 answers

IIUC you can use merge. I think the first value is size2equal M, since max weightis equal 20.

df = pd.DataFrame({'size': list('SSMMMLS'),
                   'weight': [8, 10, 11, 1, 20, 14, 12],
                   'adult' : [False] * 5 + [True] * 2})

print df
   adult size  weight
0  False    S       8
1  False    S      10
2  False    M      11
3  False    M       1
4  False    M      20
5   True    L      14
6   True    S      12

print df.groupby('adult').apply(lambda subf: subf['size'][subf['weight'].idxmax()]).reset_index(name='size2')                
   adult size2
0  False     M
1   True     L

print pd.merge(df, df.groupby('adult').apply(lambda subf: subf['size'][subf['weight'].idxmax()]).reset_index(name='size2'), on=['adult'])            
   adult size  weight size2
0  False    S       8     M
1  False    S      10     M
2  False    M      11     M
3  False    M       1     M
4  False    M      20     M
5   True    L      14     L
6   True    S      12     L
+3
source

transform loc values:

>>> df["size2"] = df["size"].loc[df.groupby("adult")["weight"].transform("idxmax")].values
>>> df
   adult size  weight size2
0  False    S       8     M
1  False    S      10     M
2  False    M      11     M
3  False    M       1     M
4  False    M      20     M
5   True    L      14     L
6   True    S      12     L

, :

>>> df.groupby("adult")["weight"].transform("idxmax")
0    4
1    4
2    4
3    4
4    4
5    5
6    5
dtype: int64

size loc:

>>> df["size"].loc[df.groupby("adult")["weight"].transform("idxmax")]
4    M
4    M
4    M
4    M
4    M
5    L
5    L
Name: size, dtype: object

, , .values, :

>>> df["size"].loc[df.groupby("adult")["weight"].transform("idxmax")].values
array(['M', 'M', 'M', 'M', 'M', 'L', 'L'], dtype=object)
>>> df["size2"] = df["size"].loc[df.groupby("adult")["weight"].transform("idxmax")].values
>>> df
   adult size  weight size2
0  False    S       8     M
1  False    S      10     M
2  False    M      11     M
3  False    M       1     M
4  False    M      20     M
5   True    L      14     L
6   True    S      12     L
+1

Just a more verbose version of @ jazrael's answer, with your data framework:

df = pd.DataFrame({'size': list('SSMMMLS'),
                   'weight': [8, 10, 11, 1, 20, 14, 12],
                   'adult' : [False] * 5 + [True] * 2})
#    adult size  weight
# 0  False    S       8
# 1  False    S      10
# 2  False    M      11
# 3  False    M       1
# 4  False    M      20
# 5   True    L      14
# 6   True    S      12

To get the size value for the maximum weight line:

def size4max_weight(subf):
    """ Return size value for the max weight line """
    return subf['size'][subf['weight'].idxmax()]

A group with an "adult" will create a series with False, True as index values:

>>> size2_col = df.groupby('adult').apply(size4max_weight)
>>> type(size2_col), size2_col.index
(pandas.core.series.Series, Index([False, True], dtype='object', name=u'adult'))

With reset_indexwe convert the series to a DataFrame ::

>>> size2_col = df.groupby('adult').apply(size4max_weight).reset_index(name='size2')
>>> size2_col
   adult size2
0  False     M
1   True     L
>>>

pd.merge on 'adult' do this:

>>> pd.merge(df, size2_col, on=['adult'])
   adult size  weight size2
0  False    S       8     M
1  False    S      10     M
2  False    M      11     M
3  False    M       1     M
4  False    M      20     M
5   True    L      14     L
6   True    S      12     L
0
source

All Articles