How to convert factor dataframe to numeric?

I have a data frame with all factor values

V1 V2 V3 abc cba cbc bba 

How to convert all values ​​in a data frame to a new one with numerical values ​​(from a to 1, b to 2, c 3, etc.)

+7
r
source share
3 answers

I would try:

 > mydf[] <- as.numeric(factor(as.matrix(mydf))) > mydf V1 V2 V3 1 1 2 3 2 3 2 1 3 3 2 3 4 2 2 1 
+11
source share

Converting from factor to numeric gives integer values. But if factor columns have levels specified as c('b', 'a', 'c', 'd') or c('c', 'b', 'a') , the integer values ​​will be in that order. To avoid this, we can specify levels calling factor again (safer)

 df1[] <- lapply(df1, function(x) as.numeric(factor(x, levels=letters[1:3]))) 

If we use data.table , one option would be to use set . This would be more efficient for large datasets. Converting to matrix can cause memory problems.

 library(data.table) setDT(df1) for(j in seq_along(df1)){ set(df1, i=NULL, j=j, value= as.numeric(factor(df1[[j]], levels= letters[1:3]))) } 
+9
source share

This approach is similar to Ananda, but uses unlist() instead of factor(as.matrix()) . Since all your columns are already factors, unlist() combine them into one factor vector with the corresponding levels.

So let's see what happens when we unlist() your data frame.

 unlist(df, use.names = FALSE) # [1] accbbbbbcaca # Levels: abc 

Now we can just run as.integer() (or c() ) in the code above, because the integer values ​​of the coefficients match your desired display. Thus, the following will lead to a revision of your entire data frame.

 df[] <- as.integer(unlist(df, use.names = FALSE)) ## note that you can also just drop the factor class with c() ## df[] <- c(unlist(df, use.names = FALSE)) df # V1 V2 V3 # 1 1 2 3 # 2 3 2 1 # 3 3 2 3 # 4 2 2 1 

Note: use.names = FALSE not required. However, dropping the name attribute will make this process more efficient than not.

Data:

 df <- structure(list(V1 = structure(c(1L, 3L, 3L, 2L), .Label = c("a", "b", "c"), class = "factor"), V2 = structure(c(1L, 1L, 1L, 1L ), .Label = "b", class = "factor"), V3 = structure(c(2L, 1L, 2L, 1L), .Label = c("a", "c"), class = "factor")), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -4L)) 
+5
source share

All Articles