Pandas GroupBy by Element and Everything Else

Im having a hard time using Pandas groupby. Say I have the following:

df2 = pd.DataFrame({'X' : ['B', 'B', 'A', 'A', 'C'], 'Y' : [1, 2, 3, 4, 5]}) 

I want to perform groupby operation to put group A together and not all together, so something like this:

 df2.groupby(<something>).groups Out[1]: {'A': [2, 3], 'not A': [0, 1, 4]} 

I tried things like sending a function, but couldn't make anything work. Is it possible?

Many thanks.

+7
python pandas group-by
source share
3 answers

Yes, groupby operations are probably the most useful, and in my case the worst is explained in the documentation.

I think you were up to something when you decided to try to do it using a function. For me, this is the best way, since the function is abstract and therefore can be used again and again if you want to change what you do, but change the parameters. The answer provided by Dan Allan is definitely how I will act and is the most elegant, but for your reference, this is how you achieve what you want to do using the function.

 def GroupFunc(x, df, col, Value): if df[col][x] == Value: return "Group 1" else: return "Group 2" DFGrouped = df2.groupby(lambda x: GroupFunc(x, df2, 'X', 'A')) 

It is clear that any function passed as a group key is called once for the index value, and the returned values ​​are used as group names. Therefore, in this example, when you call the x function, this is the index value, and then the remaining arguments are the data frame you are interested in, the column you are working with, and the value to test.

Please note that all of the above can also be achieved on one line using an anonymous function:

 DFGrouped = df2.groupby(lambda x: 'Group 1' if df2.X[x] == 'A' else 'Group 2') 

Hope this helps

+1
source share
 In [3]: df2.groupby(df2['X'] == 'A').groups Out[3]: {False: [0, 1, 4], True: [2, 3]} 
+4
source share

to expand @Dan Allan a bit, if you want to name your groups, you can use numpy.where () to create an array mapping:

 >>> df2 = pd.DataFrame({'X' : ['B', 'B', 'A', 'A', 'C'], 'Y' : [1, 2, 3, 4, 5]}) >>> m = np.where(df2['X'] == 'A', 'A', 'not A') >>> df2.groupby(m).groups {'A': [2, 3], 'not A': [0, 1, 4]} 

To check if df2 ['X'] is either A or B, you can use df2['X'].isin(['A', 'B']) instead of df2['X'] == 'A' or more awkward np.logical_or(df2['X'] == 'A', df2['X'] == 'B')

+2
source share

All Articles