Calling functions with multiple arguments when using Groupby

Question

Calling functions with multiple arguments when using Groupby

When writing functions to be used with groupby.apply or groupby.transform in pandas, if functions have several arguments, then when calling a function as part of a group, the arguments follow a comma, and not in parentheses. An example is:

def Transfunc(df, arg1, arg2, arg2):
     return something

GroupedData.transform(Transfunc, arg1, arg2, arg3)

Where the df argument is automatically passed as the first argument.

However, the same syntax is not possible when using a function to group data. Take the following example:

people = DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
people.ix[2:3, ['b', 'c']] = NA

def MeanPosition(Ind, df, Column):
    if df[Column][Ind] >= np.mean(df[Column]):
        return 'Greater Group'
    else:
        return 'Lesser Group'
# This function compares each data point in column 'a' to the mean of column 'a' and return a group name based on whether it is greater than or less than the mean

people.groupby(lambda x: MeanPosition(x, people, 'a')).mean()

The above works fine, but I can't figure out why I need to wrap a function in lambda. Based on the syntax used in the conversion and application, it seems to me that the following should work just fine:

people.groupby(MeanPosition, people, 'a').mean()

- , , ?

EDIT: , , , . , , . :

def MeanPositionList(df, Column):
    return ['Greater Group' if df[Column][row] >= np.mean(df[Column]) else 'Lesser Group' for row in df.index]

Grouped = people.groupby(np.array(MeanPositionList(people, 'a')))
Grouped.mean()

, , .

+4

python lambda pandas

Woody Pride 04 . '13 3:19

1

Jeff · Answer 1 · 2013-11-04T12:47:51+0000

apply , apply .

groupby , . , ; / .

, , , ( , )

In [22]: def f(x):
   ....:     result = Series('Greater',index=x.index)
   ....:     result[x<x.mean()] = 'Lesser'
   ....:     return result
   ....: 

In [25]: df = DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Joe', 'Wes', 'Wes', 'Travis'])

In [26]: df
Out[26]: 
               a         b         c         d         e
Joe    -0.293926  1.006531  0.289749 -0.186993 -0.009843
Joe    -0.228721 -0.071503  0.293486  1.126972 -0.808444
Wes     0.022887 -1.813960  1.195457  0.216040  0.287745
Wes    -1.520738 -0.303487  0.484829  1.644879  1.253210
Travis -0.061281 -0.517140  0.504645 -1.844633  0.683103

In [27]: df.groupby(df.index.values).transform(f)
Out[27]: 
              a        b        c        d        e
Joe      Lesser  Greater   Lesser   Lesser  Greater
Joe     Greater   Lesser  Greater  Greater   Lesser
Travis  Greater  Greater  Greater  Greater  Greater
Wes     Greater   Lesser  Greater   Lesser   Lesser
Wes      Lesser  Greater   Lesser  Greater  Greater

Calling functions with multiple arguments when using Groupby

More articles: