Connecting Data.table to an index produces unexpected results if the indexed column name is a prefix of the join column name

For a specific setup of two data.tables, the connection does not deliver the expected results. Am I mistaken in my code, or could it be a data.table problem?

Please see the example below.

library(data.table) # In the code below the join does not deliver the result I would expect DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","test","includes test within","other")) DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3)) DT1[colname_with_suffix == "not found", ] # automatically creates index on colname_with_suffix DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][] # PLEASE NOTE: same result with slightly different syntax: DT1[DT2, lookup_result := i.lookup_result, on=c(colname="lookup")][] # colname colname_with_suffix lookup_result # 1: test1 other NA # 2: test2 test NA # 3: test2 includes test within NA # 4: test3 other 3 # Expected result: # colname colname_with_suffix lookup_result # 1: test1 other 1 # 2: test2 test 2 # 3: test2 includes test within 2 # 4: test3 other 3 

For the following options, the connection works as expected. The unexpected behavior appears to be higher if the index exists in the column with the column name being the prefix of the join column name and both have the same textual content.

 # For all following alternatives the join delivers the correct result # (a) Same data tables as above, but no index DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","test","includes test within","other")) DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3)) DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][] # (b) Index on DT2, but completely different values in indexed column than in join column DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","other","other","other")) DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3)) DT1[colname_with_suffix == "not found", ] # automatically creates index on colname_with_suffix DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][] # (c) Index on DT2, similar values in indexed column, but indexed column name is not a prefix of join column name DT1 <- data.table(colname=c("test1","test2","test2","test3"), x.colname_with_suffix=c("other","test","includes test within","other")) DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3)) DT1[x.colname_with_suffix == "not found", ] # automatically creates index on x.colname_with_suffix DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][] 

SessionInfo:

 # R version 3.3.2 (2016-10-31) # Platform: x86_64-w64-mingw32/x64 (64-bit) # Running under: Windows 7 x64 (build 7601) Service Pack 1 # # locale: # [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C LC_TIME=German_Germany.1252 # # attached base packages: # [1] stats graphics grDevices utils datasets methods base # # other attached packages: # [1] data.table_1.10.0 # # loaded via a namespace (and not attached): # [1] tools_3.3.2 

Note that the same behavior is observed for data.table 1.10.4 and R.Version 3.4.2 under Windows, as well as Ubuntu Linux 14.04.

+7
r data.table
source share

No one has answered this question yet.

See related questions:

773
Removing columns of a data frame by name
181
How to delete a column by name in data.table?
fifteen
Attach R data.tables, where the key values ​​are not exactly equal - combine the lines with the nearest moments
4
Selecting rows or columns with data.table R?
3
R data.table join / subsetting / match by group and by condition
one
Select the rows in the data table. Defined by the filter in another data table.
one
How to group a group and then join | merge two data.tables?
one
Conflicting / duplicating column names in J ()?
0
Refresh a subset of data.table based on union using data.table 1.9.3 no longer works
-one
Create for loop to rename all columns of multiple data tables to R

All Articles