Cancel setkey () on data.table in R

Question

Cancel setkey () on data.table in R

I have data.table ( data in the following) with 10 columns ( C1, ..., C10 ) and I want to delete duplicate rows.

I accidentally used setkey(data,C1) , so now when I run unique(data) , I get only unique rows based on column C1 , while I want to delete the row only if it is identical to the other in all columns of C1, ..., C10 .
Is there any way to cancel the setkey() operation? I found this question , but it did not help solve my problem.

PS: I can get around the problem by setting all the columns in my data.table as keys using setkeyv(data, paste0("C", 1:10)) , but this is not at all an elegant / practical solution.

+7

r duplicates key data.table

hellter Jun 04 '16 at 9:27

source share

1 answer

MichaelChirico · Accepted Answer · 2016-06-04T09:33:39+0000

First, you can use setkey(data, NULL) to delete the key.

Secondly, unique.data.table has a by parameter, which allows you to indicate on the go which columns to use for comparison (regardless of which key is currently installed):

 unique(data, by = paste0("C", 1:10))

Third, instead of setkey for many keys, use setkeyv to pass character vector:

 setkeyv(data, paste0("C", 1:10))

A full reading of ?setkey and ?unique.data.table could fix all this.

Cancel setkey () on data.table in R

More articles: