R: setting values ​​for several cases in case

I have a question that, I hope, will not be a huge obstacle for advanced users of R ...

test.data <- data.frame(case = c(1, 1, 1, 2, 2, 2, 3), year = c(2006, 2007, 2008, 2007, 2006, 2008, 2006), level = c(10, 20, 20, 12, 20, 20, 20)) 

As you could see, I have several cases for each case that differ by year. The level values ​​differ in each case, and I would like to fix it by setting each level value to the minimum level of this case. In this example, each level value for case = 1 should be 10, and each level value for case = 2 should be 12. For any particular case, I could do the following:

 test.data$level[test.data$case==1] <- min(test.data$level[test.data$case==1]) 

But since I have several hundred things to do, it will take quite a while. So I would like to ask if you have a faster solution.

+5
source share
3 answers

You can try

  library(data.table) setDT(test.data)[, level:= min(level, na.rm=TRUE), case] # case year level #1: 1 2006 10 #2: 1 2007 10 #3: 1 2008 10 #4: 2 2007 12 #5: 2 2006 12 #6: 2 2008 12 #7: 3 2006 20 

Or using dplyr

  library(dplyr) test.data %>% group_by(case) %>% mutate(level= min(level, na.rm=TRUE)) # case year level #1 1 2006 10 #2 1 2007 10 #3 1 2008 10 #4 2 2007 12 #5 2 2006 12 #6 2 2008 12 #7 3 2006 20 

Or using sqldf/dplyr

  library(sqldf) library(dplyr) sqldf('select * from "test.data" left join(select "case", min(level) as Level from "test.data" group by "case") using ("case")') %>% select(-level) # case year Level #1 1 2006 10 #2 1 2007 10 #3 1 2008 10 #4 2 2007 12 #5 2 2006 12 #6 2 2008 12 #7 3 2006 20 

Or a modification suggested by @ G.Grothendieck using only sqldf

  sqldf('select "case", year, "min(level)" as Level from "test.data" left join(select "case", min(level) from "test.data" group by "case") using ("case")') #1 1 2006 10 #2 1 2007 10 #3 1 2008 10 #4 2 2007 12 #5 2 2006 12 #6 2 2008 12 #7 3 2006 20 

Or using base R

  test.data$level <- with(test.data, ave(level, case, FUN=min)) 
+5
source

Here is a classic using the basic functions of R.

 # may not be optimal for larger datasets due to merge min.lvl <- aggregate(level ~ case, data = test.data, FUN = min) merge(x = test.data, y = min.lvl, by = "case", all.x = TRUE, sort = FALSE) case year level.x level.y 1 1 2006 10 10 2 1 2007 20 10 3 1 2008 20 10 4 2 2007 12 12 5 2 2006 20 12 6 2 2008 20 12 7 3 2006 20 20 

The second vanilla option will do things

 new.data <- by(data = test.data, INDICES = test.data$case, FUN = function(x) { x$level <- min(x$level) x }) do.call("rbind", new.data) case year level 1.1 1 2006 10 1.2 1 2007 10 1.3 1 2008 10 2.4 2 2007 12 2.5 2 2006 12 2.6 2 2008 12 3 3 2006 20 
+5
source

Alternative use of doBy

 library(doBy) summaryBy(level ~ case, id=~ year, test.data, full.dimension=TRUE, keep.names=TRUE, min) # case level year #1: 1 10 2006 #2: 1 10 2006 #3: 1 10 2006 #4: 2 12 2007 #5: 2 12 2007 #6: 2 12 2007 #7: 3 20 2006 

Or be more compact

 library(plyr) ddply(test.data, .(case), mutate, level = min(level)) # case year level #1 1 2006 10 #2 1 2007 10 #3 1 2008 10 #4 2 2007 12 #5 2 2006 12 #6 2 2008 12 #7 3 2006 20 

Another basic R method using lapply

 do.call(rbind,lapply(split(test.data, test.data$case), function(x){x$level = min(x$level); x})) # case year level #1: 1 2006 10 #2: 1 2007 10 #3: 1 2008 10 #4: 2 2007 12 #5: 2 2006 12 #6: 2 2008 12 #7: 3 2006 20 
+3
source

All Articles