How to remove duplicated (by name) column in data.tables in R?

Question

How to remove duplicated (by name) column in data.tables in R?

When reading a dataset using fread I noticed that sometimes I get duplicate column names, for example ( fread has no check.names argument)

 > data.table( x = 1, x = 2) xx 1: 1 2

Question: is there a way to remove 1 of 2 columns if they have the same name?

+7

r data.table

Marcin kosiński Mar 16 '15 at 21:45

source share

3 answers

with=FALSE will return a copy of the columns you selected. Instead, simply remove these duplicated columns using := , by reference.

 dt[, which(duplicated(names(dt))) := NULL] # x # 1: 1

+7

Arun Mar 17 '15 at 12:39

source share

Different approaches:

Indexing
my.data.table <- my.data.table[ ,-2, with=FALSE]
Submenu
my.data.table <- subset(my.data.table, select = -2)
Create unique names if 1. and 2. are not ideal (for example, if you have hundreds of columns)
setnames(my.data.table, make.names(names = names(my.data.table), unique=TRUE))
It is optional to systematize the removal of variables whose names meet some criteria (here we will get rid of all variables that have a name ending in ".X" (X is a number starting at 2 when using make.names )
my.data.table <- subset(my.data.table, select = !grepl(pattern = "\\.\\d$", x = names(my.data.table)))

+3

Dominic Comtois Mar 16 '15 at 21:48

source share

Ben bolker · Accepted Answer · 2015-03-16T21:53:03+0000

What about

 dt[,unique(names(dt)),with=FALSE]

? From ?data.table :

j: single column name, single instance of column names, 'List () expressions of column names, expressions, or a function call that evaluates a' list (including 'Data.frame and' data.table, which are also lists, or (when 'with = FALSE) is the vector of names or positions to select.

This picks the first occurrence of each name (I'm not sure how you want to deal with this).

As @DavidArenburg points out in the comments above, you can use check.names=TRUE in data.table() (however, I don't see the check.names option in fread() - maybe I missed something).

How to remove duplicated (by name) column in data.tables in R?

More articles: