R - create a new variable, where each observation depends on another table and other variables in the data frame

I have the following two tables:

df <- data.frame(eth = c("A","B","B","A","C"),ZIP1 = c(1,1,2,3,5)) Inc <- data.frame(ZIP2 = c(1,2,3,4,5,6,7),A = c(56,98,43,4,90,19,59), B = c(49,10,69,30,10,4,95),C = c(69,2,59,8,17,84,30)) eth ZIP1 ZIP2 ABC A 1 1 56 49 69 B 1 2 98 10 2 B 2 3 43 69 59 A 3 4 4 30 8 C 5 5 90 10 17 6 19 4 84 7 59 95 39 

I would like to create an Inc variable in the df data frame, where for each case, the value is the intersection of eth and ZIP of the case. In my example, this will result in:

  eth ZIP1 Inc A 1 56 B 1 49 B 2 10 A 3 43 C 5 17 

A loop or pretty brute force can solve it, but it takes time for my data set, I'm looking for a more subtle way, perhaps using data.table. It seems to me that this is a very standard question, and I apologize, if so, for my inability to formulate an exact heading for this problem (as you may have noticed ..), perhaps why I did not find such a question in a search on the forum ..

Thanks!

+8
r data.table
source share
5 answers

How about this?

 library(reshape2) merge(df, melt(Inc, id="ZIP2"), by.x = c("ZIP1", "eth"), by.y = c("ZIP2", "variable")) ZIP1 eth value 1 1 A 56 2 1 B 49 3 2 B 10 4 3 A 43 5 5 C 17 
+5
source share

Of course, this can be done in data.table:

 library(data.table) setDT(df) df[ melt(Inc, id.var="ZIP2", variable.name="eth", value.name="Inc"), Inc := i.Inc , on=c(ZIP1 = "ZIP2","eth") ] 

The syntax for this merge-merge operation is X[i, Xcol := expression, on=merge_cols] .

You can run the i = melt(Inc, id.var="ZIP", variable.name="eth", value.name="Inc") yourself to see how it works. Inside a merge, columns from i can be assigned to i.* .


Alternatively ...

 setDT(df) setDT(Inc) df[, Inc := Inc[.(ZIP1), eth, on="ZIP2", with=FALSE], by=eth] 

It is built on a similar idea. Vignettes for packages are a good place for such syntax.

+6
source share

We can use row/column indexing

 df$Inc <- Inc[cbind(match(df$ZIP1, Inc$ZIP2), match(df$eth, colnames(Inc)))] df # eth ZIP1 Inc #1 A 1 56 #2 B 1 49 #3 B 2 10 #4 A 3 43 #5 C 5 17 
+6
source share

Another option:

 library(dplyr) library(tidyr) Inc %>% gather(eth, value, -ZIP2) %>% left_join(df, ., by = c("eth", "ZIP1" = "ZIP2")) 
+5
source share

my decision (which may seem uncomfortable)

 for (i in 1:length(df$eth)) { df$Inc[i] <- Inc[as.character(df$eth[i])][df$ZIP[i],] } 
+2
source share

All Articles