I am new to R, and almost everything I do is a typical methodology that I learned in other languages. However, whenever I looked for the answers Rhere, the code structure was very different than I expected.
I have a data table that contains panel data for individuals. I want to look at the average result of the characteristic, and then split the sample twice: those that are higher than the median average result, and those that are lower.
Here is the structure of my data.table, yearly:
user wage year
1: 65122111 9.74 2003
2: 65122111 7.85 2004
3: 65122111 97.16 2005
4: 65122111 48.22 2006
5: 65122111 91.24 2007
6: 65122111 9.35 2008
7: 65122112 80.00 2007
8: 65122112 0.00 2008
And here is what I do:
meanWages <- yearly[, list(meanWage = mean(wage)), by=(user)]
highWage <- meanWages[meanWage > median(meanWages[, meanWage]), user]
lowWage <- meanWages[meanWage < median(meanWages[, meanWage]), user]
yearlyHigh <- yearly[is.element(user,highWage),]
yearlyLow <- yearly[is.element(user,highWage),]
I guess this gives me what I expect (validation is pretty cumbersome), but it seems to be very lumpy and inefficient. What would be a more efficient and concise way to do the same?