Insert the string NA after each data group using data.table

I am trying to add the string NA after each data group in R

A similar question was asked earlier. Insert a blank line after each data group .

The accepted answer works fine in this case also as follows.

 group <- c("a","b","b","c","c","c","d","d","d","d") xvalue <- c(16:25) yvalue <- c(1:10) df <- data.frame(cbind(group,xvalue,yvalue)) df_new <- as.data.frame(lapply(df, as.character), stringsAsFactors = FALSE) head(do.call(rbind, by(df_new, df$group, rbind, NA)), -1 ) group xvalue yvalue a.1 a 16 1 a.2 <NA> <NA> <NA> b.2 b 17 2 b.3 b 18 3 b.31 <NA> <NA> <NA> c.4 c 19 4 c.5 c 20 5 c.6 c 21 6 c.41 <NA> <NA> <NA> d.7 d 22 7 d.8 d 23 8 d.9 d 24 9 d.10 d 25 10 

How can I speed this up using data.table for a large data.frame file?

+7
r data.table
source share
2 answers

You can try

 df$group <- as.character(df$group) setDT(df)[, .SD[1:(.N+1)], by=group][is.na(xvalue), group:=NA][!.N] # group xvalue yvalue #1: a 16 1 #2: NA NA NA #3: b 17 2 #4: b 18 3 #5: NA NA NA #6: c 19 4 #7: c 20 5 #8: c 21 6 #9: NA NA NA #10: d 22 7 #11: d 23 8 #12: d 24 9 #13: d 25 10 

Or as suggested by @David Arenburg

  setDT(df)[, indx := group][, .SD[1:(.N+1)], indx][,indx := NULL][!.N] 

or

  setDT(df)[df[,.I[1:(.N+1)], group]$V1][!.N] 

Or it can be further simplified based on @eddi's comments

  setDT(df)[df[, c(.I, NA), group]$V1][!.N] 
+8
source share

One way that I could think of is to first build the vector as follows:

 foo <- function(x) { o = order(rep.int(seq_along(x), 2L)) c(x, rep.int(NA, length(x)))[o] } join_values = head(foo(unique(df_new$group)), -1L) # [1] "a" NA "b" NA "c" NA "d" 

And then setkey() and join .

 setkey(setDT(df_new), group) df_new[.(join_values), allow.cartesian=TRUE] # group xvalue yvalue # 1: a 16 1 # 2: NA NA NA # 3: b 17 2 # 4: b 18 3 # 5: NA NA NA # 6: c 19 4 # 7: c 20 5 # 8: c 21 6 # 9: NA NA NA # 10: d 22 7 # 11: d 23 8 # 12: d 24 9 # 13: d 25 10 
+5
source share

All Articles