Apply a function to a pandas Dataframe whose return value is based on other rows

I have a Dataframe that looks like this:

>>> import pandas >>> df = pandas.DataFrame({'region' : ['east', 'west', 'south', 'west', ... 'east', 'west', 'east', 'west'], ... 'item' : ['one', 'one', 'two', 'three', ... 'two', 'two', 'one', 'three'], ... 'quantity' : [3,3,4,5,12,14,3,8], "price" : [50,50,12,35,10,10,12,12]}) >>> df item price quantity region 0 one 50 3 east 1 one 50 3 west 2 two 12 4 south 3 three 35 5 west 4 two 10 12 east 5 two 10 14 west 6 one 12 3 east 7 three 12 8 west 

and what I want to do is change the values ​​in the quantity column. Each new quantity value is calculated based on the number of different regions that exist for this combination of rows, and the price. More specifically, I want to take each quantity and multiply it by the weight of its area, returned by the function I wrote, which takes a region and a list of another region that makes up the pool:

region_weight(region, list_of_regions) . For this imaginary situation, let's say:

  • region east costs 1
  • region west costs 2
  • Cost south costs 3

Then the return weight of the east in the east basin, the west is 0.3333333333333333333 (1/3). The weight of the south in the east, west, south basin is 0.5 (1/2).

So, for the first line, we consider that the other lines have one point and a price of 50. There are 2 from the east and one from the western region. The new quantity in the first line will be: 3 * region_weight("east", ["east", "west"]) or 3 * 0.3333333333333333.

I want to apply the same process to the entire quantity column. I do not know how to approach this problem with the pandas library, except for scrolling the Dataframe row by row.

+4
source share
1 answer

Ok, I think this does what you want:

Make a dictionary of your regional weights:

 In [1]: weights = {'east':1,'west':2,'south':3} 

The following function maps the values ​​from the series to the value found in the scale dictionary. x is the row value for the region, and w is the series of regions after it has been matched with the dict weights.

 In [2]: def f(x): ...: w = x.map(weights) ...: return w / w.sum().astype(float) 

Here we group ['item','price'] and apply the function above. The output is a series of relative weights for unique combinations of items and prices.

 In [3]: df.groupby(['item','price']).region.apply(f) Out[3]: 0 0.333333 1 0.666667 2 1.000000 3 1.000000 4 0.333333 5 0.666667 6 1.000000 7 1.000000 

Finally, you can multiply df.quantity by the series above to calculate your adjusted weights.

 In [4]: df['wt_quant'] = df.groupby(['item','price']).region.apply(f) * df.quantity In [5]: df Out[5]: item price quantity region wt_quant 0 one 50 3 east 1.000000 1 one 50 3 west 2.000000 2 two 12 4 south 4.000000 3 three 35 5 west 5.000000 4 two 10 12 east 4.000000 5 two 10 14 west 9.333333 6 one 12 3 east 3.000000 7 three 12 8 west 8.000000 
+4
source

All Articles