Calculate delta from values ​​in data frame

I have this DataFrame (this is just an example, not real data):

In [1]: import pandas as pd
        my_data = [{'client_id' : '001', 'items' : '10', 'month' : 'Jan'},
                   {'client_id' : '001', 'items' : '20', 'month' : 'Feb'},
                   {'client_id' : '001', 'items' : '30', 'month' : 'Mar'},
                   {'client_id' : '002', 'items' : '30', 'month' : 'Jan'},
                   {'client_id' : '002', 'items' : '20', 'month' : 'Feb'},
                   {'client_id' : '002', 'items' : '15', 'month' : 'Mar'},
                   {'client_id' : '003', 'items' : '10', 'month' : 'Jan'},
                   {'client_id' : '003', 'items' : '20', 'month' : 'Feb'},
                   {'client_id' : '003', 'items' : '15', 'month' : 'Mar'}]
        df = pd.DataFrame(my_data)

In  [2]: df
Out [2]:    
            client_id   month        items
         0        001     Jan           10
         1        001     Feb           20
         2        001     Mar           30
         3        002     Jan           30
         4        002     Feb           20
         5        002     Mar           15
         6        003     Jan           10
         7        003     Feb           20
         8        003     Mar           15

I want to calculate the delta items purchased for every couple of months. That is, for example, the customer "001" bought 10 more items in February (20) than in January (10). Customer "002", bought -10 pieces (February 20, January 30). The final DataFrame will look like this:

In [3]: delta_df
Out [3]:   
            client_id   delta_items_feb   delta_items_mar
        0         001                10                10
        1         002               -10                -5
        2         003                10                -5

Any thoughts on how to do this?

+4
source share
4 answers

Here is one way, using the pivot_tableclient and the month for the first group of counting elements:

(first I entered the column itemsin integers with df.items = df.items.astype(int))

>>> table = df.pivot_table(values='items', rows='client_id', cols='month')
>>> table = table[['Jan', 'Feb', 'Mar']]
>>> pd.DataFrame(np.diff(table.values), 
                 columns=['delta_items_feb', 'delta_items_mar'],
                 index=table.index).reset_index()

  client_id  delta_items_feb  delta_items_mar
0       001               10               10
1       002              -10               -5
2       003               10               -5

: pandas index/columns rows/cols .

:

  • ,
  • np.diff DataFrame
+1

. :

>>> df['deltas'] = df.groupby('client_id')\
                     .apply(lambda x: x['items'].astype(int).diff()).values

  client_id  items month  deltas
0       001     10   Jan     NaN
1       001     20   Feb      10
2       001     30   Mar      10
3       002     30   Jan     NaN
4       002     20   Feb     -10
5       002     15   Mar      -5
6       003     10   Jan     NaN
7       003     20   Feb      10
8       003     15   Mar      -5

, , :

>>> df.pivot(index='client_id', columns='month', values='deltas')\
      .drop('Jan', axis=1)

month       Feb  Mar
client_id       
001         10  10
002        -10  -5
003         10  -5
+1

,

#change 'items' from string to int
## use loc to avoid "slice" warning
df.loc[:,"items"] = df["items"].map(int)

# use pivot to make columns for each unique value in "month" column
dfp = df.pivot('client_id','month','items')

# calculate delta and put in a new column 
dfp["dJF"] = dfp.Feb - dfp.Jan

month     Feb Jan Mar  dJF
client_id                 
001        20  10  30   10
002        20  30  15  -10
003        20  10  15   10
0
1) clietn_id to  set. Set to list client_listand sorted  ['001','002','003'] .
2) month string to int Jan-1;Feb-2;Mar -3 and etc
3)  for client in client_listand:
    For every client create new list
    for line in you_date:
        When ides of clients coincide, add to the list #filter by client_id
     sorted result by month
     in the loop from data of one client generate the lines of outgoing table. 
     delta_items_mar = item[n]-item[n-1]
-1

All Articles