Apply a function to a pandas Dataframe whose return value is based on other rows

Question

Apply a function to a pandas Dataframe whose return value is based on other rows

I have a Dataframe that looks like this:

>>> import pandas >>> df = pandas.DataFrame({'region' : ['east', 'west', 'south', 'west', ... 'east', 'west', 'east', 'west'], ... 'item' : ['one', 'one', 'two', 'three', ... 'two', 'two', 'one', 'three'], ... 'quantity' : [3,3,4,5,12,14,3,8], "price" : [50,50,12,35,10,10,12,12]}) >>> df item price quantity region 0 one 50 3 east 1 one 50 3 west 2 two 12 4 south 3 three 35 5 west 4 two 10 12 east 5 two 10 14 west 6 one 12 3 east 7 three 12 8 west

and what I want to do is change the values in the quantity column. Each new quantity value is calculated based on the number of different regions that exist for this combination of rows, and the price. More specifically, I want to take each quantity and multiply it by the weight of its area, returned by the function I wrote, which takes a region and a list of another region that makes up the pool:

region_weight(region, list_of_regions) . For this imaginary situation, let's say:

region east costs 1
region west costs 2
Cost south costs 3

Then the return weight of the east in the east basin, the west is 0.3333333333333333333 (1/3). The weight of the south in the east, west, south basin is 0.5 (1/2).

So, for the first line, we consider that the other lines have one point and a price of 50. There are 2 from the east and one from the western region. The new quantity in the first line will be: 3 * region_weight("east", ["east", "west"]) or 3 * 0.3333333333333333.

I want to apply the same process to the entire quantity column. I do not know how to approach this problem with the pandas library, except for scrolling the Dataframe row by row.

+4

python pandas

Tristan boudreault Jan 20 '13 at 21:55

source share

1 answer

Zelazny7 · Accepted Answer · 2013-01-21T01:40:20+0000

Ok, I think this does what you want:

Make a dictionary of your regional weights:

 In [1]: weights = {'east':1,'west':2,'south':3}

The following function maps the values from the series to the value found in the scale dictionary. x is the row value for the region, and w is the series of regions after it has been matched with the dict weights.

 In [2]: def f(x): ...: w = x.map(weights) ...: return w / w.sum().astype(float)

Here we group ['item','price'] and apply the function above. The output is a series of relative weights for unique combinations of items and prices.

 In [3]: df.groupby(['item','price']).region.apply(f) Out[3]: 0 0.333333 1 0.666667 2 1.000000 3 1.000000 4 0.333333 5 0.666667 6 1.000000 7 1.000000

Finally, you can multiply df.quantity by the series above to calculate your adjusted weights.

 In [4]: df['wt_quant'] = df.groupby(['item','price']).region.apply(f) * df.quantity In [5]: df Out[5]: item price quantity region wt_quant 0 one 50 3 east 1.000000 1 one 50 3 west 2.000000 2 two 12 4 south 4.000000 3 three 35 5 west 5.000000 4 two 10 12 east 4.000000 5 two 10 14 west 9.333333 6 one 12 3 east 3.000000 7 three 12 8 west 8.000000

Apply a function to a pandas Dataframe whose return value is based on other rows

More articles: