Step-by-step connection / new grouping / union in R using data.table X [Y] syntax

I have two data.tables : samples, resources

resources is associated with samples through primary and secondary identifiers. I want to first combine information from resources with a sample table using a primary identifier, and only if this leads to NA, then I want to use secondary resources from one table (in the same data.table command chain).

 # resources: primary secondary info 1: 17 42 "I" 2: 18 NA "J" 3: 19 43 "K" # samples: name primary secondary 1: "a" 17 55 2: "b" 0 42 3 "c" 18 42 

Desired Result:

 # joined tables: name info # primary secondary 1: "a" "I" 2: "b" "I" 3: "c" "J" 

The first connection through primary easy, it produces

 # Update: samples <- data.table(name = letters[1:3], primary = c(17, 0, 18), secondary = c(55, 42, 42)) resources <- data.table(primary = 17:19, secondary = c(42, NA, 43), info = LETTERS[9:11]) # first join: setkey(samples, primary) setkey(resources, primary) samples[resources] name info # primary secondary 1: "a" "I" 2: "b" NA 3: "c" "J" 

But then? I need to repeat the selection of samples using setkey(samples, secondary) , right? And then a subset of only those lines that create NA. But all this is impossible in one chain of commands (and imagine that there were more than two criteria ...). How can I achieve this more succinctly?

... updated with code for data.tables.

+7
r data.table
source share
3 answers

While you can do it on one line, I think that hides the meaning of what you are doing, makes things incredibly difficult to read / understand / debug / remember what the hell you did in a month, and just a bad idea.

Smaller, much more easily digestible pieces are the way to IMO:

 setkey(samples, primary) setkey(resources, primary) samples[resources, info := i.info] setkey(samples, secondary) setkey(resources, secondary) samples[resources, info := ifelse(is.na(info), i.info, info)] samples # name primary secondary info #1: b 0 42 I #2: c 18 42 J #3: a 17 55 I # keep going with tertiary and so on if you like 

As @nachti noted in the comments, you may need to add allow.cartesian=TRUE for versions prior to 1.9.5 depending on your data.

+5
source share

This will be one chain with two calls to resources , one of which will be reinstalled behind the scenes.

 library(data.table) samples <- data.table(name = letters[1:3], primary = c(17, 0, 18), secondary = c(55, 42, 42)) resources <- data.table(primary = 17:19, secondary = c(42, NA, 43), info = LETTERS[9:11]) setkey(samples, primary) setkey(resources, primary) samples[resources, info := i.info ][, .(name, info),, secondary ][resources[, info,, secondary], info := ifelse(is.na(info), i.info, info) ][, secondary := NULL] 

As you ask for more complex examples. It is worth noting that data.table queries can be easily managed in the form of modules, having prepared interrogation arguments in advance. They can later be easily conditionally managed. An example is below.

 lkp2 <- quote(resources[, info,, secondary]) lkp2_formula <- quote(info := ifelse(is.na(info), i.info, info)) setkey(samples, primary) samples[resources, info := i.info ][, .(name, info),, secondary ][eval(lkp2), eval(lkp2_formula) ][, secondary := NULL] 

If you rely heavily on .table data chain processes, you may find the dtq package useful.

+2
source share

I think it is too difficult to do this in one command chain, but I have a solution for you:

 ### First step samples[resources[samples, nomatch = 0], info := info] samples name primary secondary info 1: b 0 42 NA 2: a 17 55 I 3: c 18 42 J ### Second step setkey(samples, secondary) setkey(resources, secondary) ## create new column info1 samples[resources[samples[is.na(info)], list(info1 = unique(info)), by = .EACHI], info1 := info1] ## merge it to samples, where info is NA samples[is.na(info), info := info1] ## remove info1 (and maybe other unused columns) samples[, info1 := NULL] ## sort samples by name setkey(samples, name) samples name primary secondary info 1: a 17 55 I 2: b 0 42 I 3: c 18 42 J 

NTN
~ R

+1
source share

All Articles