How to use lookup table in R without duplicating?

I was wondering if anyone has a good way to achieve this. I have a data frame where each observation (= element) belonging to a specific group (= condition) has a given value:

# Create sample data. item = rep(1:3,2) #6 items condition = c(rep("control",3), rep("related",3)) #2 conditions value = c(10,11,12,20,21,22) #6 values df = data.frame(item, condition, value) item condition value 1 1 control 10 2 2 control 11 3 3 control 12 4 1 related 20 5 2 related 21 6 3 related 22 

I also have a lookup table containing the average for each group:

 # Create lookup table. condition = c("control", "related") mean = c(11,21) table = data.frame(condition, mean) condition mean 1 control 11 2 related 21 

I want to change the original data frame so that it contains a new label column , which says β€œlow” if the element value is less than the group otherwise β€œhigh”. It should look like this:

 # How the output should look like. # If the item value is less than the group mean, write "low". Write "high" otherwise. item = rep(1:3,2) condition = c(rep("control",3), rep("related",3)) value = c(10,11,12,20,21,22) label = c(rep(c("low", "high", "high"),2)) output = data.frame(item, condition, value, label) item condition value label 1 1 control 10 low 2 2 control 11 high 3 3 control 12 high 4 1 related 20 low 5 2 related 21 high 6 3 related 22 high 

If it were just copying the group value into the original data frame, I would use merge . But I need to consider the value of the group in order to write a new label for each element that says β€œlow” or β€œhigh” depending on the average value of the group.

One thing I tried was to first merge my data frame with a table, and then use ifelse to compare the column value with the average column. This works, but I also get the middle column in my data frame, which I don't need (I only need the label column). Of course, I can delete the middle column manually, but it seems awkward. So I was wondering: does anyone know a better / more elegant solution?

Thanks a bunch!

+6
source share
1 answer

Here are a few alternatives. (1) and (2) use only the R base and (2), (3) and (5), do not create the middle column only for explicit deletion. In (1), (3) and (4), we used left joins, although internal joins would give the same result with this data, and in case (1a) they would allow us to write (1) as one line.

1) merge

 m <- merge(df, table, all.x = TRUE) transform(m, label = ifelse(value < mean, "low", "high"), mean = NULL) 

giving:

  item condition value label 1 1 control 10 low 2 2 control 11 high 3 3 control 12 high 4 1 related 20 low 5 2 related 21 high 6 3 related 22 high 

1a) With an internal connection, it can be reduced to:

 transform(merge(df, table), label = ifelse(value < mean, "low", "high"), mean = NULL) 

2) match

 transform(df, label = ifelse(value < table$mean[match(condition, table$condition)], "low", "high") ) 

giving the same thing.

3) sqldf

 library(sqldf) sqldf("select df.*, case when value < mean then 'low' else 'high' end label from df left join 'table' using (condition)") 

4) dplyr

 library(dplyr) df %>% left_join(table) %>% mutate(label = ifelse(value < mean, "low", "high")) %>% select(- mean) 

5) data.table

 library(data.table) dt <- as.data.table(df) setkey(dt, "condition") dt[table, label := ifelse(value < mean, "low", "high")] 
+11
source

All Articles