R data.table group without aggregate

How to get a data table in R to just return a column of grouped values, where I do not use any other aggregate functions? Say I have:

test<-data.table(x=c(rep("a",2),rep("b",3)),y=1:5) 

And I just want to come back:

 a b 

When i use:

 test[,,by=x] 

I'm coming back:

  xy 1: a 1 2: a 2 3: b 3 4: b 4 5: b 5 

And when I do:

 test[,x,by=x] 

I'm coming back:

  xx 1: aa 2: bb 

I know I can use:

 test[,.(unique(x))] 

But that doesn't seem like the right way to do this, and besides, if I wanted to return two columns grouped?

+5
source share
3 answers

I would do this by applying unique() to a data.table containing only a subset of the grouping columns in which I was interested. Passing data.table to unique() , as shown below, will call unique.data.table() , which works just as well for two or more columns as it does for one:

 unique(test[, list(x)]) ## or unique(test[, x, with=FALSE]) # x # 1: a # 2: b ## Add another column to see that unique.data.table() works fine in that case as well test[, z:=c(1,1,1,2,2)] unique(test[, .(x,z)]) ## .() is data.table shorthand for list() # xz # 1: a 1 # 2: b 1 # 3: b 2 
+6
source

Agreeing with Josh that unique() is the right choice, but perhaps consider this approach:

 > unique(test$x) [1] "a" "b" 

Also, if you need the lines:

 > rbind(unique(test$x)) [,1] [,2] [1,] "a" "b" 

Or columns:

 > cbind(unique(test$x)) [,1] [1,] "a" [2,] "b" 
0
source

Late to the party, but I know what you're asking

There is no direct answer, but here is a workaround.

  test[,x,by=x][,x] # Suppress one of the x's [1] "a" "b" 

invisible () should also work as follows:

Im uses j only for its side effect, but Im still returns data. How to stop it? In this case, j can be wrapped invisible (); for example, DT [, invisible (hist (colB)), through = colA] http://datatable.r-forge.r-project.org/datatable-faq.pdf

Or that would also be a solution.

  test[,invisible(x),by=x] # Still prints j, just hides its name! x V1 1: aa 2: bb 

However, the following may make you happily abandon the quest:

Why is column grouping in a key faster than ad hoc?

Since each group is contiguous in RAM, thus minimizing the selection of pages and memory can be copied in bulk (memcpy in C) and not in the C loop. Http://datatable.r-forge.r-project.org/datatable -faq.pdf

0
source

All Articles