I have a really simple problem, but I probably don't think that the vector th is enough to solve it effectively. I tried two different approaches, and they have long been fixated on two different computers. I would like to say that the contest made it more exciting, but ... bleh.
group observations
I have long data (many lines per person, one line per person), and I basically want a variable that tells me how often a person has already been observed.
I have the first two columns and the third is required:
person wave obs pers1 1999 1 pers1 2000 2 pers1 2003 3 pers2 1998 1 pers2 2001 2
Now I use two approaches. Both are painfully slow (150 thousand lines). I am sure that something is missing, but my search queries have not yet helped me (it is difficult to formulate the problem).
Thanks for any pointers!
# ordered dataset by persnr and year of observation person.obs <- person.obs[order(person.obs$PERSNR,person.obs$wave) , ] person.obs$n.obs = 0 # first approach: loop through people and assign range unp = unique(person.obs$PERSNR) unplength = length(unp) for(i in 1:unplength) { print(unp[i]) person.obs[which(person.obs$PERSNR==unp[i]),]$n.obs = 1:length(person.obs[which(person.obs$PERSNR==unp[i]),]$n.obs) i=i+1 gc() } # second approach: loop through rows and reset counter at new person pnr = 0 for(i in 1:length(person.obs[,2])) { if(pnr!=person.obs[i,]$PERSNR) { pnr = person.obs[i,]$PERSNR e = 0 } e=e+1 person.obs[i,]$n.obs = e i=i+1 gc() }
optimization r
Ruben
source share