Pandas: fill NaNs with the following non-NaN / # consecutive NaNs

I want to take a series of pandas and fill NaN average value of the next numerical value, where: average = next numerical value / (# consecutive NaNs + 1)

Here is my code so far, I just can't figure out how to split the filler column among NaN (and the following numerical value) in num :

 import pandas as pd dates = pd.date_range(start = '1/1/2016',end = '1/12/2016', freq = 'D') nums = [10, 12, None, None, 39, 10, 11, None, None, None, None, 60] df = pd.DataFrame({ 'date':dates, 'num':nums }) df['filler'] = df['num'].fillna(method = 'bfill') 

Current output:

  date num filler 0 2016-01-01 10.0 10.0 1 2016-01-02 12.0 12.0 2 2016-01-03 NaN 39.0 3 2016-01-04 NaN 39.0 4 2016-01-05 39.0 39.0 5 2016-01-06 10.0 10.0 6 2016-01-07 11.0 11.0 7 2016-01-08 NaN 60.0 8 2016-01-09 NaN 60.0 9 2016-01-10 NaN 60.0 10 2016-01-11 NaN 60.0 11 2016-01-12 60.0 60.0 

Output Required:

  date num 0 2016-01-01 10.0 1 2016-01-02 12.0 2 2016-01-03 13.0 3 2016-01-04 13.0 4 2016-01-05 13.0 5 2016-01-06 10.0 6 2016-01-07 11.0 7 2016-01-08 12.0 8 2016-01-09 12.0 9 2016-01-10 12.0 10 2016-01-11 12.0 11 2016-01-12 12.0 
+8
python pandas pandas-groupby
source share
1 answer
  • Make reverse cumsum from notnull
  • Use this for groupby and transform with mean

 csum = df.num.notnull()[::-1].cumsum() filler = df.num.fillna(0).groupby(csum).transform('mean') df.assign(filler=filler) date num filler 0 2016-01-01 10.0 10.0 1 2016-01-02 12.0 12.0 2 2016-01-03 NaN 13.0 3 2016-01-04 NaN 13.0 4 2016-01-05 39.0 13.0 5 2016-01-06 10.0 10.0 6 2016-01-07 11.0 11.0 7 2016-01-08 NaN 12.0 8 2016-01-09 NaN 12.0 9 2016-01-10 NaN 12.0 10 2016-01-11 NaN 12.0 11 2016-01-12 60.0 12.0 

how it works

  • df.num.notnull().cumsum() is the standard method for finding groups of contiguous zeros. However, I wanted my groups to finish with the following numeric value. So I turned the series over and then cumsum .
  • I want included in my average number of zeros. The easiest way to do this is to fill with zero and accept the normal value for the groups that I just did.
  • transform to translate an existing index
  • assign new column. Despite the fact that he changed the series, the index will be rebuilt as magic. Maybe he used loc , but overwrites the existing df . I will let the OP decide to rewrite if they want.
+11
source share

All Articles