Using match inside i data .table

Question

Using match inside i data .table

The operator %in%is a wrapper for the matching function that returns a "vector of the same length as x". For instance:

> match(c("a", "b", "c"), c("a", "a"), nomatch = 0) > 0
## [1]  TRUE FALSE FALSE

When used in idata tables, however

(dt1 <- data.table(v1 = c("a", "b", "c"), v2 = "dt1"))
   v1  v2
1:  a dt1
2:  b dt1
3:  c dt1
(dt2 <- data.table(v1 = c("a", "a"), v2 = "dt2"))
   v1  v2
1:  a dt2
2:  a dt2
dt1[v1 %in% dt2$v1]
   v1  v2
1:  a dt1
2:  a dt1

Copies Received

. If the expected behavior %in%inside the i.table data does not produce the same result as

dt1[dt1$v1 %in% dt2$v1]  
   v1  v2
1:  a dt1

i.e. no duplicates?

+4

r match data.table

Jens Feb 09 '15 at 14:24

source share

1 answer

David Arenburg · Accepted Answer · 2015-02-10T09:26:26+0000

It was a mistake in data.tableV <1.9.5 automatic indexing, fixed in V> = 1.9.5.

I can think of three possible workarounds:

Turn off automatic indexing and use the R base %in%, as in

options(datatable.auto.index = FALSE)
dt1[v1 %in% dt2$v1]
##    v1  v2
## 1:  a dt1

%chin%, ( )

dt1[v1 %chin% dt2$v1]
##    v1  v2
## 1:  a dt1

Github ( R )

library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
library(data.table)
dt1 <- data.table(v1 = c("a", "b", "c"), v2 = "dt1")
dt2 <- data.table(v1 = c("a", "a"), v2 = "dt2")
dt1[v1 %in% dt2$v1]
##    v1  v2
## 1:  a dt1

Using match inside i data .table

More articles: