R: The best way to split this sample

Question

R: The best way to split this sample

I am new to R, and almost everything I do is a typical methodology that I learned in other languages. However, whenever I looked for the answers Rhere, the code structure was very different than I expected.

I have a data table that contains panel data for individuals. I want to look at the average result of the characteristic, and then split the sample twice: those that are higher than the median average result, and those that are lower.

Here is the structure of my data.table, yearly:

       user     wage year
1: 65122111     9.74 2003
2: 65122111     7.85 2004
3: 65122111    97.16 2005
4: 65122111    48.22 2006
5: 65122111    91.24 2007
6: 65122111     9.35 2008
7: 65122112    80.00 2007
8: 65122112     0.00 2008

And here is what I do:

## get mean wages
meanWages <- yearly[, list(meanWage = mean(wage)), by=(user)]
## split by median
highWage <- meanWages[meanWage > median(meanWages[, meanWage]), user]
lowWage <- meanWages[meanWage < median(meanWages[, meanWage]), user]
## split original sample
yearlyHigh <- yearly[is.element(user,highWage),]
yearlyLow <- yearly[is.element(user,highWage),]

I guess this gives me what I expect (validation is pretty cumbersome), but it seems to be very lumpy and inefficient. What would be a more efficient and concise way to do the same?

+4

r data.table

FooBar 14 . '15 13:19

2

, , .

yearly[, meanwage := mean(wage), by=user]
yearlyHigh <- yearly[meanwage >= median(meanwage)]
yearlyLow <- yearly[meanwage < median(meanwage)]

+3

Yevgeny Tkach 14 . '15 13:31

shadow · Accepted Answer · 2015-04-14T15:30:45+0000

dplyr. , , .

yearly %>% 
  group_by(user) %>% 
  mutate(meanwage = mean(wage)) %>% 
  filter(meanwage >= median(meanwage))

. .

yearly %>% 
  group_by(user) %>%
  mutate(meanwage = mean(wage)) %>%
  ungroup %>%
  mutate(cat = ifelse(meanwage >= median(meanwage), "high", "low")) %>%
  group_by(cat) %>%
  do(data.table("further analyses here ..."))

data.table:

yearly[, meanwage := mean(wage), by=user]
yearly[, cat := ifelse(meanwage >= median(meanwage), "high", "low")]
yearly[, "further analyses here ...", by = cat]

R: The best way to split this sample

More articles: