Firstly, you want to convert the date column to datetime pandas (and not rows):
In [11]: pd.to_datetime(df['date'], format='%d%b%Y') Out[11]: 0 2009-06-20 1 2009-06-24 2 2009-07-15 3 2008-02-09 4 2008-02-21 5 2010-03-14 6 2010-05-02 7 2010-05-12 Name: date, dtype: datetime64[ns]
Note: see docs for possible format options.
In [12]: df['date'] = pd.to_datetime(df['date'], format='%d%b%Y') In [13]: df Out[13]: patient date sequence 0 145 2009-06-20 1 1 145 2009-06-24 2 2 145 2009-07-15 3 3 582 2008-02-09 1 4 582 2008-02-21 2 5 987 2010-03-14 1 6 987 2010-05-02 2 7 987 2010-05-12 3
If this is not indicated in date order (for each patient), I would sort it first:
In [14]: df = df.sort('date')
Now you can group and copy:
In [15]: g = df.groupby('patient') In [16]: g.cumcount() + 1 Out[16]: 2 1 3 2 0 1 1 2 4 1 5 2 6 3 dtype: int64
This is what you want (all this is not in order):
In [17]: df['sequence'] = g.cumcount() + 1 In [18]: df Out[18]: patient date sequence 2 582 2008-02-09 1 3 582 2008-02-21 2 0 145 2009-06-24 1 1 145 2009-07-15 2 4 987 2010-03-14 1 5 987 2010-05-02 2 6 987 2010-05-12 3
To reorder (although you may not need to) use sort_index (or we could sort_index if we kept the original DataFrame index): *
In [19]: df.sort_index() Out[19]: patient date sequence 0 145 2009-06-24 1 1 145 2009-07-15 2 2 582 2008-02-09 1 3 582 2008-02-21 2 4 987 2010-03-14 1 5 987 2010-05-02 2 6 987 2010-05-12 3
source share