Changing factor levels in a column with setattr is sensitive to how the column was created

I want to change column factor levels using setattr . However, when a column is selected in the standard way data.table ( dt[ , col] ), levels not updated. On the other hand, when you select a column in an unorthodox way in the data.table setting, namely when using $ -it it works.

 library(data.table) # Some data d <- data.table(x = factor(c("b", "a", "a", "b")), y = 1:4) d # xy # 1: b 1 # 2: a 2 # 3: a 3 # 4: b 4 # We want to change levels of 'x' using setattr # New desired levels lev <- c("a_new", "b_new") # Select column in the standard data.table way setattr(x = d[ , x], name = "levels", value = lev) # Levels are not updated d # xy # 1: b 1 # 2: a 2 # 3: a 3 # 4: b 4 # Select column in a non-standard data.table way using $ setattr(x = d$x, name = "levels", value = lev) # Levels are updated d # xy # 1: b_new 1 # 2: a_new 2 # 3: a_new 3 # 4: b_new 4 # Just check if d[ , x] really is the same as d$x d <- data.table(x = factor(c("b", "a", "a", "b")), y = 1:4) identical(d[ , x], d$x) # [1] TRUE # Yes, it seems so 

It looks like I missed some basics of data.table ( R ?). Can someone explain what is happening?


I found two more posts on setattr and levels :

setattr on levels saving unwanted duplicates (R data.table)

How to change factor column levels in a data table.

Both of them used $ to select a column. None of them mentioned the method [ , col] .

+7
r data.table
source share
1 answer

This may help to understand if you are looking at the address from both expressions:

 address(d$x) # [1] "0x10e4ac4d8" address(d$x) # [1] "0x10e4ac4d8" address(d[,x]) # [1] "0x105e0b520" address(d[,x]) # [1] "0x105e0a600" 

Note that the address from the first expression does not change when you call it several times, and the second expression indicates that it makes a copy of the column due to the dynamic nature of the address, so setattr on it will not affect the original data table.

+8
source share

All Articles