Change the class from factorial to numerical for many columns in the data frame

Question

Change the class from factorial to numerical for many columns in the data frame

What is the fastest / best way to change a large number of columns to numeric from a coefficient?

I used the following code, but it seems to have re-ordered my data.

> head(stats[,1:2]) rk team 1 1 Washington Capitals* 2 2 San Jose Sharks* 3 3 Chicago Blackhawks* 4 4 Phoenix Coyotes* 5 5 New Jersey Devils* 6 6 Vancouver Canucks* for(i in c(1,3:ncol(stats))) { stats[,i] <- as.numeric(stats[,i]) } > head(stats[,1:2]) rk team 1 2 Washington Capitals* 2 13 San Jose Sharks* 3 24 Chicago Blackhawks* 4 26 Phoenix Coyotes* 5 27 New Jersey Devils* 6 28 Vancouver Canucks*

What is the best way without naming each column, as in:

 df$colname <- as.numeric(ds$colname)

+74

r

Btibert3 Sep 26 2018-10-10T00:

source share

16 answers

You must be careful when changing factors to numeric. Here is a line of code that would change the set of columns from coefficient to numeric. I assume that the columns that need to be changed to numeric are 1, 3, 4, and 5, respectively. You can change it accordingly.

 cols = c(1, 3, 4, 5); df[,cols] = apply(df[,cols], 2, function(x) as.numeric(as.character(x)));

+68

Ramnath Sep 26 '10 at 5:06

source share

This can be done on a single line; there is no need for a loop, whether it's a for loop or an application. Use unlist () instead:

 # testdata Df <- data.frame( x = as.factor(sample(1:5,30,r=TRUE)), y = as.factor(sample(1:5,30,r=TRUE)), z = as.factor(sample(1:5,30,r=TRUE)), w = as.factor(sample(1:5,30,r=TRUE)) ) ## Df[,c("y","w")] <- as.numeric(as.character(unlist(Df[,c("y","w")]))) str(Df)

Change: for your code, it becomes:

 id <- c(1,3:ncol(stats))) stats[,id] <- as.numeric(as.character(unlist(stats[,id])))

Obviously, if you have a data frame from a single column, and you do not want the automatic reduction in the size of R to convert it to a vector, you will have to add the argument drop=FALSE .

+34

Joris Meys Sep 26 '10 at

source share

I know this issue has been resolved a long time ago, but recently I had a similar problem, and I think I found a slightly more elegant and functional solution, although this requires the magrittr package.

 library(magrittr) cols = c(1, 3, 4, 5) df[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

The %<>% operator directs and reassigns, which is very useful for simplifying data cleaning and conversion. Now the list application function is much easier to read, specifying only the function you want to apply.

+28

Dan Apr 03 '16 at 13:07 on

source share

I think ucfagls found why your loop is not working.

If you still don't want to use a loop, then this is a solution with lapply :

 factorToNumeric <- function(f) as.numeric(levels(f))[as.integer(f)] cols <- c(1, 3:ncol(stats)) stats[cols] <- lapply(stats[cols], factorToNumeric)

Change I found a simpler solution. It seems that as.matrix converted to a character. So

 stats[cols] <- as.numeric(as.matrix(stats[cols]))

should do what you want.

+6

Marek Sep 26 '10 at 15:04

source share

lapply pretty much designed for that

 unfactorize<-c("colA","colB") df[,unfactorize]<-lapply(unfactorize, function(x) as.numeric(as.character(df[,x])))

+5

transcom Jan 07 '14 at 17:48

source share

I found this function on a couple of other repeating threads and found its elegant and general way to solve this problem. This thread appears primarily in most queries on this topic, so I shared it here to save people for a while. I do not take responsibility for this, so see here the original posts here and here .

 df <- data.frame(x = 1:10, y = rep(1:2, 5), k = rnorm(10, 5,2), z = rep(c(2010, 2012, 2011, 2010, 1999), 2), j = c(rep(c("a", "b", "c"), 3), "d")) convert.magic <- function(obj, type){ FUN1 <- switch(type, character = as.character, numeric = as.numeric, factor = as.factor) out <- lapply(obj, FUN1) as.data.frame(out) } str(df) str(convert.magic(df, "character")) str(convert.magic(df, "factor")) df[, c("x", "y")] <- convert.magic(df[, c("x", "y")], "factor")

+2

Electioneer Dec 08 '16 at 16:56

source share

I would like to point out that if you have NA in any column, just using indexes will not work. If the coefficient has NA, you should use the script application provided by Ramnath.

eg.

 Df <- data.frame( x = c(NA,as.factor(sample(1:5,30,r=T))), y = c(NA,as.factor(sample(1:5,30,r=T))), z = c(NA,as.factor(sample(1:5,30,r=T))), w = c(NA,as.factor(sample(1:5,30,r=T))) ) Df[,c(1:4)] <- as.numeric(as.character(Df[,c(1:4)]))

Returns the following:

 Warning message: NAs introduced by coercion > head(Df) xyzw 1 NA NA NA NA 2 NA NA NA NA 3 NA NA NA NA 4 NA NA NA NA 5 NA NA NA NA 6 NA NA NA NA

But:

 Df[,c(1:4)]= apply(Df[,c(1:4)], 2, function(x) as.numeric(as.character(x)))

Return:

 > head(Df) xyzw 1 NA NA NA NA 2 2 3 4 1 3 1 5 3 4 4 2 3 4 1 5 5 3 5 5 6 4 2 4 4

+1

Elizabeth Feb 26 '16 at 21:46

source share

You can use unfactor() from the CRAN package "varhandle":

 library("varhandle") my_iris <- data.frame(Sepal.Length = factor(iris$Sepal.Length), sample_id = factor(1:nrow(iris))) my_iris <- unfactor(my_iris)

+1

Mehrad Mahmoudian Aug 2 '18 at 12:14

source share

I like this code because it is quite convenient:

  data[] <- lapply(data, function(x) type.convert(as.character(x), as.is = TRUE)) #change all vars to their best fitting data type

This is not quite what was requested (converted to numerical), but in many cases even more appropriate.

+1

SDahm Dec 14 '18 at 9:47

source share

Here are a few dplyr options:

 # by column type: df %>% mutate_if(is.factor, ~as.numeric(as.character(.))) # by specific columns: df %>% mutate_at(vars(x, y, z), ~as.numeric(as.character(.))) # all columns: df %>% mutate_all(~as.numeric(as.character(.)))

+1

sbha Mar 21 '19 at 1:47

source share

I had problems converting all columns to numeric with a call to apply() :

 apply(data, 2, as.numeric)

The problem is that some of the lines have a comma in them - for example, “1,024.63” instead of “1024.63” - and R does not like this way of formatting numbers. So I deleted them and then ran as.numeric() :

 data = as.data.frame(apply(data, 2, function(x) { y = str_replace_all(x, ",", "") #remove commas return(as.numeric(y)) #then convert }))

Note that this requires the stringr package to load.

0

Deleet May 20 '15 at 18:11

source share

What worked for me. The apply() function tries to force df to the matrix and returns NA.

numeric.df <- as.data.frame(sapply(df, 2, as.numeric))

0

Alina Shabatov Dec 07 '16 at 7:05

source share

Based on @SDahm's answer, this was the “optimal” solution for my tibble :

 data %<>% lapply(type.convert) %>% as.data.table()

This requires dplyr and magrittr .

0

James Hirschorn Dec 17 '18 at 23:54

source share

I tried several similar problems and continued to get AN. Base R has some really annoying enforcement methods that are usually fixed in Tidyverse packages. I used to avoid them because I didn’t want to create dependencies, but they make life a lot easier, and now I don’t even try to find a Base R solution most of the time.

Here is the Tidyverse solution, which is extremely simple and elegant:

 library(purrr) mydf <- data.frame( x1 = factor(c(3, 5, 4, 2, 1)), x2 = factor(c("A", "C", "B", "D", "E")), x3 = c(10, 8, 6, 4, 2)) map_df(mydf, as.numeric)

0

Aaron Cooley Feb 04 '19 at 16:07

source share

df$colname <- as.numeric(df$colname)

I tried this method to change one type of column, and I think it is better than many other versions if you are not going to change all types of columns

df$colname <- as.character(df$colname)

for the opposite.

0

huseyn rahimov Jun 20 '19 at 13:13

source share

Gavin Simpson · Accepted Answer · 2010-09-26 10:10

In response to Ramnath’s answer, the behavior you experience is that as.numeric(x) returns the internal numeric representation of the coefficient x at level R. If you want to keep numbers that are factor levels (not their internal representation), you need to first convert the character via as.character() according to the Ramnat example.

Your for loop is as smart as calling apply , and may be a little more readable as to what the code intends. Just change this line:

 stats[,i] <- as.numeric(stats[,i])

to read

 stats[,i] <- as.numeric(as.character(stats[,i]))

This is FAQ 7.10 in the FAQ FAQ.

NTN

Change the class from factorial to numerical for many columns in the data frame

More articles: