When writing functions to be used with groupby.apply or groupby.transform in pandas, if functions have several arguments, then when calling a function as part of a group, the arguments follow a comma, and not in parentheses. An example is:
def Transfunc(df, arg1, arg2, arg2):
return something
GroupedData.transform(Transfunc, arg1, arg2, arg3)
Where the df argument is automatically passed as the first argument.
However, the same syntax is not possible when using a function to group data. Take the following example:
people = DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
people.ix[2:3, ['b', 'c']] = NA
def MeanPosition(Ind, df, Column):
if df[Column][Ind] >= np.mean(df[Column]):
return 'Greater Group'
else:
return 'Lesser Group'
people.groupby(lambda x: MeanPosition(x, people, 'a')).mean()
The above works fine, but I can't figure out why I need to wrap a function in lambda. Based on the syntax used in the conversion and application, it seems to me that the following should work just fine:
people.groupby(MeanPosition, people, 'a').mean()
- , , ?
EDIT: , , , . , , . :
def MeanPositionList(df, Column):
return ['Greater Group' if df[Column][row] >= np.mean(df[Column]) else 'Lesser Group' for row in df.index]
Grouped = people.groupby(np.array(MeanPositionList(people, 'a')))
Grouped.mean()
, , .