Since V 1.9.4 data.table optimized to use a binary connection at %in% if a data set has already been entered. Therefore, @Richards answer should have the same performance for the latest versions of data.table (btw, %in% had an error while using datatable.auto.index = TRUE , so please make sure you have data.table installed data.table v 1.9.6+ if you are going to use it)
The following is an illustration of data.table using a binary connection using the %in% function
require(data.table) set.seed(123) dt <- data.table ( a = sample(letters, 25, replace = T), b = sample(50:100, 25, replace = F)) dtv <- data.table( vowel = c( 'a','e','i','o','u') ) setkey(dt, a) options(datatable.verbose = TRUE) dt[a %in% dtv$vowel]
Anyway, you were almost there, and you can easily change c by joining
dt[, c := 'consonant'] dt[dtv, c := 'vowel']
Or, if you want to avoid joining unnecessary columns from dtv (if present), you can only join the first column in dtv
dt[dtv$vowel, c := 'consonant']
Please note that I have not used .() Or J() . data.table will perform a binary join instead of indexing the default rows if the i th element is not of type integer or numeric . This is important if, for example, you want to perform a binary join on a column b (which is of type integer ). Compare
setkey(dt, b) dt[80:85]
and
dt[.(80:85)] # or dt[J(80:85)] # Starting bmerge ...done in 0 secs <~~~ binary join was triggered # ab # 1: x 80 # 2: x 81 # 3: NA 82 # 4: NA 83 # 5: o 84 # 6: NA 85
Another difference between these two methods is that %in% will not return disparate instances, compare
setkey(dt, a) dt[a %in% dtv$vowel]
and
dt[dtv$vowel]
In this particular case, it does not matter, because := will not change the unsurpassed values, but you can use nomatch = 0L in other cases
dt[dtv$vowel, nomatch = 0L]
Remember to set options(datatable.verbose = FALSE) if you don't want data.table be so verbose.