How to remove duplicated (by name) column in data.tables in R?

When reading a dataset using fread I noticed that sometimes I get duplicate column names, for example ( fread has no check.names argument)

 > data.table( x = 1, x = 2) xx 1: 1 2 

Question: is there a way to remove 1 of 2 columns if they have the same name?

+7
r data.table
source share
3 answers

What about

 dt[,unique(names(dt)),with=FALSE] 

? From ?data.table :

j: single column name, single instance of column names, 'List () expressions of column names, expressions, or a function call that evaluates a' list (including 'Data.frame and' data.table, which are also lists, or (when 'with = FALSE) is the vector of names or positions to select.

This picks the first occurrence of each name (I'm not sure how you want to deal with this).

As @DavidArenburg points out in the comments above, you can use check.names=TRUE in data.table() (however, I don't see the check.names option in fread() - maybe I missed something).

+10
source share

with=FALSE will return a copy of the columns you selected. Instead, simply remove these duplicated columns using := , by reference.

 dt[, which(duplicated(names(dt))) := NULL] # x # 1: 1 
+7
source share

Different approaches:

  • Indexing

    my.data.table <- my.data.table[ ,-2, with=FALSE]

  • Submenu

    my.data.table <- subset(my.data.table, select = -2)

  • Create unique names if 1. and 2. are not ideal (for example, if you have hundreds of columns)

    setnames(my.data.table, make.names(names = names(my.data.table), unique=TRUE))

  • It is optional to systematize the removal of variables whose names meet some criteria (here we will get rid of all variables that have a name ending in ".X" (X is a number starting at 2 when using make.names )

    my.data.table <- subset(my.data.table, select = !grepl(pattern = "\\.\\d$", x = names(my.data.table)))

+3
source share

All Articles