I have a df with names and some eligibility status dates. I would like to create an indicator of how many unique elig_end_dates a person has, according to time. here is my df:
names date_of_claim elig_end_date
1 tom 2010-01-01 2010-07-01
2 tom 2010-05-04 2010-07-01
3 tom 2010-06-01 2014-01-01
4 tom 2010-10-10 2014-01-01
5 mary 2010-03-01 2014-06-14
6 mary 2010-05-01 2014-06-14
7 mary 2010-08-01 2014-06-14
8 mary 2010-11-01 2014-06-14
9 mary 2011-01-01 2014-06-14
10 john 2010-03-27 2011-03-01
11 john 2010-07-01 2011-03-01
12 john 2010-11-01 2011-03-01
13 john 2011-02-01 2011-03-01
Here is my desired result:
names date_of_claim elig_end_date obs
1 tom 2010-01-01 2010-07-01 1
2 tom 2010-05-04 2010-07-01 1
3 tom 2010-06-01 2014-01-01 2
4 tom 2010-10-10 2014-01-01 2
5 mary 2010-03-01 2014-06-14 1
6 mary 2010-05-01 2014-06-14 1
7 mary 2010-08-01 2014-06-14 1
8 mary 2010-11-01 2014-06-14 1
9 mary 2011-01-01 2014-06-14 1
10 john 2010-03-27 2011-03-01 1
11 john 2010-07-01 2011-03-01 1
12 john 2010-11-01 2011-03-01 1
13 john 2011-02-01 2011-03-01 1
I found this post useful R: count unique values ββby category , but the answers are given as a separate table, and not as included in df.
I also tried this:
df$ob = ave(df$elig_end_date, df$elig_end_date, FUN=seq_along)
But this creates an account, and I really need an indicator.
Thank you in advance
DEGREE CODE PRODUCT (which is not the correct code - just publishing as a training point)
names date_of_claim elig_end_date ob
1 tom 2010-01-01 2010-07-01 2
2 tom 2010-05-04 2010-07-01 2
3 tom 2010-06-01 2014-01-01 2
4 tom 2010-10-10 2014-01-01 2
5 mary 2010-03-01 2014-06-14 5
6 mary 2010-05-01 2014-06-14 5
7 mary 2010-08-01 2014-06-14 5
8 mary 2010-11-01 2014-06-14 5
9 mary 2011-01-01 2014-06-14 5
10 john 2010-03-27 2011-03-01 4
11 john 2010-07-01 2011-03-01 4
12 john 2010-11-01 2011-03-01 4
13 john 2011-02-01 2011-03-01 4
source
share