Calculation of MAD (Mean Absolute Deviation) GroupBy Pandas

Question

Calculation of MAD (Mean Absolute Deviation) GroupBy Pandas

I have a dataframe:

Type Name Cost AX 545 BY 789 CZ 477 DX 640 CX 435 BZ 335 AX 850 BY 152

I have all such combinations in my framework with type ['A', 'B', 'C', 'D'] and names ['X', 'Y', 'Z']. I used the groupby method to get statistics on a specific combination, for example AX, AY, AZ . Here is the code:

 df = pd.DataFrame({'Type':['A','B','C','D','C','B','A','B'] ,'Name':['X','Y','Z','X','X','Z','X','Y'], 'Cost':[545,789,477,640,435,335,850,152]}) df.groupby(['Name','Type']).agg([mean,std]) #need to use mad instead of std

I need to eliminate observations that exceed 3 MAD; sort of:

 test = df[np.abs(df.Cost-df.Cost.mean())<=(3*df.Cost.mad())]

I got confused about this since df.Cost.mad () returns MAD for the cost of all the data, and not for a specific type category. How could I combine both?

+5

python pandas group-by aggregate dataframe

Hypothetical Ninja Apr 24 '15 at 11:44

source share

1 answer

Julien Spronck · Accepted Answer · 2015-04-24T12:11:04+0000

You can use groupby and transform to create new data series that you can use to filter your data.

 groups = df.groupby(['Name','Type']) mad = groups['Cost'].transform(lambda x: x.mad()) dif = groups['Cost'].transform(lambda x: np.abs(x - x.mean())) df2 = df[dif <= 3*mad]

However, in this case, not a single line is filtered out, since the difference is equal to the average absolute deviation (groups have no more than two lines).

Calculation of MAD (Mean Absolute Deviation) GroupBy Pandas

More articles: