R - create a new variable, where each observation depends on another table and other variables in the data frame

Question

R - create a new variable, where each observation depends on another table and other variables in the data frame

I have the following two tables:

df <- data.frame(eth = c("A","B","B","A","C"),ZIP1 = c(1,1,2,3,5)) Inc <- data.frame(ZIP2 = c(1,2,3,4,5,6,7),A = c(56,98,43,4,90,19,59), B = c(49,10,69,30,10,4,95),C = c(69,2,59,8,17,84,30)) eth ZIP1 ZIP2 ABC A 1 1 56 49 69 B 1 2 98 10 2 B 2 3 43 69 59 A 3 4 4 30 8 C 5 5 90 10 17 6 19 4 84 7 59 95 39

I would like to create an Inc variable in the df data frame, where for each case, the value is the intersection of eth and ZIP of the case. In my example, this will result in:

  eth ZIP1 Inc A 1 56 B 1 49 B 2 10 A 3 43 C 5 17

A loop or pretty brute force can solve it, but it takes time for my data set, I'm looking for a more subtle way, perhaps using data.table. It seems to me that this is a very standard question, and I apologize, if so, for my inability to formulate an exact heading for this problem (as you may have noticed ..), perhaps why I did not find such a question in a search on the forum ..

Thanks!

+8

r data.table

Yurienu Nov 13 '15 at 23:56

source share

5 answers

Of course, this can be done in data.table:

 library(data.table) setDT(df) df[ melt(Inc, id.var="ZIP2", variable.name="eth", value.name="Inc"), Inc := i.Inc , on=c(ZIP1 = "ZIP2","eth") ]

The syntax for this merge-merge operation is X[i, Xcol := expression, on=merge_cols] .

You can run the i = melt(Inc, id.var="ZIP", variable.name="eth", value.name="Inc") yourself to see how it works. Inside a merge, columns from i can be assigned to i.* .

Alternatively ...

 setDT(df) setDT(Inc) df[, Inc := Inc[.(ZIP1), eth, on="ZIP2", with=FALSE], by=eth]

It is built on a similar idea. Vignettes for packages are a good place for such syntax.

+6

Frank Nov 14 '15 at 0:16

source share

We can use row/column indexing

 df$Inc <- Inc[cbind(match(df$ZIP1, Inc$ZIP2), match(df$eth, colnames(Inc)))] df # eth ZIP1 Inc #1 A 1 56 #2 B 1 49 #3 B 2 10 #4 A 3 43 #5 C 5 17

+6

akrun Nov 14 '15 at 4:51

source share

Another option:

 library(dplyr) library(tidyr) Inc %>% gather(eth, value, -ZIP2) %>% left_join(df, ., by = c("eth", "ZIP1" = "ZIP2"))

+5

Steven beaupré Nov 14 '15 at 4:37

source share

my decision (which may seem uncomfortable)

 for (i in 1:length(df$eth)) { df$Inc[i] <- Inc[as.character(df$eth[i])][df$ZIP[i],] }

+2

aristotll Nov 14 '15 at 0:36

source share

Datamine r · Accepted Answer · 2015-11-14T00:37:15+0000

How about this?

 library(reshape2) merge(df, melt(Inc, id="ZIP2"), by.x = c("ZIP1", "eth"), by.y = c("ZIP2", "variable")) ZIP1 eth value 1 1 A 56 2 1 B 49 3 2 B 10 4 3 A 43 5 5 C 17

R - create a new variable, where each observation depends on another table and other variables in the data frame

More articles: