A single-pass solution would certainly be ideal, but here is a multi-pass solution using only (presumably) cythonized pandas functions:
def get_delay(ds): x1 = (~ds).cumsum() x2 = x1.where(ds, np.nan).ffill() return x1 - x2 date_range = pd.date_range('2010-01-01', '2010-01-06') ds = pd.Series([False, True, False, False, True, False], index=date_range) pd.concat([ds, get_delay(ds)], axis=1) Event Last 2010-01-01 False NaN 2010-01-02 True 0 2010-01-03 False 1 2010-01-04 False 2 2010-01-05 True 0 2010-01-06 False 1
And interestingly, in some quick tests, it seems a little better, perhaps due to the fact that you avoid line-by-line actions:
%%timeit -n 1000 def get_delay(ds): x1 = (~ds).cumsum() x2 = x1.where(ds, np.nan).ffill() return x1 - x2 n = 100 events = np.random.choice([True, False], size=n) date_range = pd.date_range('2010-01-01', periods=n) df = pd.DataFrame(events, index=date_range, columns=['event']) get_delay(df['event']) 1000 loops, best of 3: 1.09 ms per loop
Unlike the one-loop approach with the global one:
%%timeit -n 1000 last = pd.to_datetime(np.nan) def elapsed(row): if not row.event: return row.name - last else: global last last = row.name return row.name-last n = 100 events = np.random.choice([True, False], size=n) date_range = pd.date_range('2010-01-01', periods=n) df = pd.DataFrame(events, index=date_range, columns=['event']) df.apply(elapsed, axis=1) 1000 loops, best of 3: 2.4 ms per loop
Perhaps there is some nuance in this comparison that does not make it honest, but in any case, the version without custom functions, of course, does not look much slower, if at all.