How to convert factor dataframe to numeric?

Question

How to convert factor dataframe to numeric?

I have a data frame with all factor values

V1 V2 V3 abc cba cbc bba

How to convert all values in a data frame to a new one with numerical values (from a to 1, b to 2, c 3, etc.)

+7

r

mamatv Jan 01 '15 at 15:25

source share

3 answers

A5C1D2H2I1M1N2O1R2T1 · Answer 1 · 2016-01-01T15:30:40+0000

I would try:

 > mydf[] <- as.numeric(factor(as.matrix(mydf))) > mydf V1 V2 V3 1 1 2 3 2 3 2 1 3 3 2 3 4 2 2 1

akrun · Answer 2 · 2016-01-01T15:27:02+0000

Converting from factor to numeric gives integer values. But if factor columns have levels specified as c('b', 'a', 'c', 'd') or c('c', 'b', 'a') , the integer values will be in that order. To avoid this, we can specify levels calling factor again (safer)

 df1[] <- lapply(df1, function(x) as.numeric(factor(x, levels=letters[1:3])))

If we use data.table , one option would be to use set . This would be more efficient for large datasets. Converting to matrix can cause memory problems.

 library(data.table) setDT(df1) for(j in seq_along(df1)){ set(df1, i=NULL, j=j, value= as.numeric(factor(df1[[j]], levels= letters[1:3]))) }

Rich scriven · Answer 3 · 2016-01-01T16:15:43+0000

This approach is similar to Ananda, but uses unlist() instead of factor(as.matrix()) . Since all your columns are already factors, unlist() combine them into one factor vector with the corresponding levels.

So let's see what happens when we unlist() your data frame.

 unlist(df, use.names = FALSE) # [1] accbbbbbcaca # Levels: abc

Now we can just run as.integer() (or c() ) in the code above, because the integer values of the coefficients match your desired display. Thus, the following will lead to a revision of your entire data frame.

 df[] <- as.integer(unlist(df, use.names = FALSE)) ## note that you can also just drop the factor class with c() ## df[] <- c(unlist(df, use.names = FALSE)) df # V1 V2 V3 # 1 1 2 3 # 2 3 2 1 # 3 3 2 3 # 4 2 2 1

Note: use.names = FALSE not required. However, dropping the name attribute will make this process more efficient than not.

Data:

 df <- structure(list(V1 = structure(c(1L, 3L, 3L, 2L), .Label = c("a", "b", "c"), class = "factor"), V2 = structure(c(1L, 1L, 1L, 1L ), .Label = "b", class = "factor"), V3 = structure(c(2L, 1L, 2L, 1L), .Label = c("a", "c"), class = "factor")), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -4L))

How to convert factor dataframe to numeric?

More articles: