I am trying to combine two data frames, df and myData, depending on the elements in the column from each. The df column purposefully contains nested lists, and I would like to join if the element in the nested list matches the myData element. I would like to keep unsurpassed strings in df (left join).
Here is an example, first without nested lists in df.
df = data.frame(a=1:5)
df$x1= c("a", "b", "g", "a", "a")
str(df)
'data.frame': 5 obs. of 2 variables:
$ a : int 1 2 3 4 5
$ x1: chr "a" "b" "g" "a" ...
myData <- data.frame(x1=c("a", "g", "q"), x2= c("za", "zg", "zq"), stringsAsFactors = FALSE)
Now we can join the column x1:
df$x2 <- NA
for(id in 1:nrow(myData)){
df$x2[df$x1 %in% myData$x1[id]] <- myData$x2[id]
}
Or using dplyr:
library(dplyr)
df = data.frame(a=1:5)
df$x1= c("a", "b", "g", "a", "a")
df %>%
left_join(myData)
Now consider df with nested lists.
l1 = list(letters[1:5])
l2 = list(letters[6:10])
df = data.frame(a=1:5)
df$x1= c("a", "b", "g", l1, l2)
Using a for loop does not match the elements of a nested list, as we expect:
df$x2 <- NA
for(id in 1:nrow(myData)){
df$x2[df$x1 %in% myData$x1[id]] <- myData$x2[id]
}
output:
df
a x1 x2
1 1 a za
2 2 b <NA>
3 3 g zg
4 4 a, b, c, d, e <NA>
5 5 f, g, h, i, j <NA>
Using dplyr:
df %>%
left_join(myData)
causes an error:
Joining by: c("x1", "x2")
Error: cannot join on column 'x1'
, , , .
data.table. data.table, . , data.table , , .
100 000 , R ( data.table?)
Fwiw, ( ) - , Python, , R.
?