key problem: using setattr to change level names, unwanted duplicates are saved.
I clear some data where I have seerarl level levels, all of which are the same, appear as two or more different levels. (This error is mainly due to problems with typos and files) I have 153K factors, and 5% need to be fixed.
Example
In the following example, the vector has three levels, two of which must be folded into one.
incorrect <- factor(c("AOB", "QTX", "A_B"))
A vector is part of a data.table .
Everything works fine when using the levels<- function to change level names.
However, if you use setattr , unwanted duplicates are retained.
mydt1 <- data.table(id=1:3, incorrect, key="id") mydt2 <- data.table(id=1:3, incorrect, key="id") # assigning levels, duplicate levels are dropped levels(mydt1$incorrect) <- gsub("_", "O", levels(mydt1$incorrect)) # using setattr, duplicate levels are not dropped setattr(mydt2$incorrect, "levels", gsub("_", "O", levels(mydt2$incorrect))) # RESULTS # Assigning Levels # Using `setattr` > mydt1$incorrect > mydt2$incorrect [1] AOB QTX AOB [1] AOB QTX AOB Levels: AOB QTX Levels: AOB AOB QTX <~~~ Notice the duplicate level
Any thoughts on why this is and / or any options for changing this behavior? (i.e. ..., droplevels=TRUE ?) Thanks
r duplicate-removal data.table
Ricardo saporta
source share