Data.table - does setkey (...) create an index or physically reorder rows in a data table?

This (very simple) question is the result of an exchange here .

The documentation for setkey() states:

setkey () sorts the data.table and marks it as sorted. Sorted columns are the key. The key can be any column in any order. Columns are always sorted in ascending order. The table has been changed to a link ... (highlighted by me)

I have always interpreted this as meaning that setkey() creates an index, rather than physically rearranging the rows of the data table (similar to indexing the database table). But if this is true, then deleting the key (using setkey(DT,NULL) ) should delete the index and restore the data table to its original, unsorted order. This is not what happens:

 library(data.table) DT <- data.table(a=3:1, b=1:3, c=5:7); DT abc 1: 3 1 5 2: 2 2 6 3: 1 3 7 setkey(DT,a); DT abc 1: 1 3 7 2: 2 2 6 3: 3 1 5 setkey(DT,NULL) abc 1: 1 3 7 2: 2 2 6 3: 3 1 5 

So, two questions:

1: If the rows are rearranged (sorted), then what does “change by reference” mean?

2: What does setkey(DT,NULL) do exactly?

+17
r data.table
Nov 19 '13 at 16:13
source share
1 answer
  • Lines are sorted. “Changed by reference” here means that the entire table is not copied, and the rows are simply replaced.

  • setkey(DT, NULL) equivalent to setattr(DT, "sorted", NULL) . It simply disables the sorted attribute.

+12
Nov 19 '13 at 16:54
source share




All Articles