Yes, groupby operations are probably the most useful, and in my case the worst is explained in the documentation.
I think you were up to something when you decided to try to do it using a function. For me, this is the best way, since the function is abstract and therefore can be used again and again if you want to change what you do, but change the parameters. The answer provided by Dan Allan is definitely how I will act and is the most elegant, but for your reference, this is how you achieve what you want to do using the function.
def GroupFunc(x, df, col, Value): if df[col][x] == Value: return "Group 1" else: return "Group 2" DFGrouped = df2.groupby(lambda x: GroupFunc(x, df2, 'X', 'A'))
It is clear that any function passed as a group key is called once for the index value, and the returned values ββare used as group names. Therefore, in this example, when you call the x function, this is the index value, and then the remaining arguments are the data frame you are interested in, the column you are working with, and the value to test.
Please note that all of the above can also be achieved on one line using an anonymous function:
DFGrouped = df2.groupby(lambda x: 'Group 1' if df2.X[x] == 'A' else 'Group 2')
Hope this helps
Woody pride
source share