Convert multiple columns of a data frame at once

It seems that I spent a lot of time creating a data frame from a file, database, or something else, and then converting each column to the type I wanted (number, coefficient, character, etc.). Is there a way to do this in one step, perhaps by specifying a type vector?

foo<-data.frame(x=c(1:10), y=c("red", "red", "red", "blue", "blue", "blue", "yellow", "yellow", "yellow", "green"), z=Sys.Date()+c(1:10)) foo$x<-as.character(foo$x) foo$y<-as.character(foo$y) foo$z<-as.numeric(foo$z) 

instead of the last three commands, I would like to do something like

 foo<-convert.magic(foo, c(character, character, numeric)) 
+38
type-conversion r
Oct 06 2018-11-11T00:
source share
10 answers

Edit See this related question for some simplifications and extensions of this basic idea.

My comment on Brandon using switch :

 convert.magic <- function(obj,types){ for (i in 1:length(obj)){ FUN <- switch(types[i],character = as.character, numeric = as.numeric, factor = as.factor) obj[,i] <- FUN(obj[,i]) } obj } out <- convert.magic(foo,c('character','character','numeric')) > str(out) 'data.frame': 10 obs. of 3 variables: $ x: chr "1" "2" "3" "4" ... $ y: chr "red" "red" "red" "blue" ... $ z: num 15254 15255 15256 15257 15258 ... 

For really large data frames, you can use lapply instead of the for loop:

 convert.magic1 <- function(obj,types){ out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor); FUN1(obj[,i])}) names(out) <- colnames(obj) as.data.frame(out,stringsAsFactors = FALSE) } 

At the same time, pay attention to some of the subtleties of coercive data in R. For example, the conversion from a coefficient to a number often includes as.numeric(as.character(...)) . Also, be aware of data.frame() and as.data.frame() default behavior of converting a character to a coefficient.

+35
Oct 06 2018-11-22T00:
source share

If you want to automatically determine the data type of the columns, and not manually specify it (for example, after processing the data, etc.), the type.convert() function can help.

The type.convert() function takes a character vector and tries to determine the optimal type for all elements (which means that it should be applied once per column).

 df[] <- lapply(df, function(x) type.convert(as.character(x))) 

Since I love dplyr , I prefer:

 library(dplyr) df <- df %>% mutate_all(funs(type.convert(as.character(.)))) 
+18
Jul 13 '15 at 7:28
source share

I find myself facing this a lot. It's about how you import data. All read ... () functions have some option that allows not to convert character strings to a coefficient. This means that text strings will remain a symbol, and things that look like numbers will remain like numbers. The problem arises when you have items that are empty, not NA. But then again, na.strings = c ("", ...) should also solve this. I would start to carefully study the import process and adjust it accordingly.

But you can always create a function and skip this line.

 convert.magic <- function(x, y=NA) { for(i in 1:length(y)) { if (y[i] == "numeric") { x[i] <- as.numeric(x[[i]]) } if (y[i] == "character") x[i] <- as.character(x[[i]]) } return(x) } foo <- convert.magic(foo, c("character", "character", "numeric")) > str(foo) 'data.frame': 10 obs. of 3 variables: $ x: chr "1" "2" "3" "4" ... $ y: chr "red" "red" "red" "blue" ... $ z: num 15254 15255 15256 15257 15258 ... 
+7
Oct 06 '11 at 10:15
source share

I know that I am already responding late, but using a loop along with an attribute function is a simple solution to your problem.

 names <- c("x", "y", "z") chclass <- c("character", "character", "numeric") for (i in (1:length(names))) { attributes(foo[, names[i]])$class <- chclass[i] } 
+6
Sep 30 '14 at 23:44
source share

I just came across something similar with the RSQLite fetch method ... the results are returned as atomic data types. In my case, it was a date stamp that made me disappointed. I have found that the setAs function setAs very useful for helping as work properly. Here is my small example.

 ##data.frame conversion function convert.magic2 <- function(df,classes){ out <- lapply(1:length(classes), FUN = function(classIndex){as(df[,classIndex],classes[classIndex])}) names(out) <- colnames(df) return(data.frame(out)) } ##small example case tmp.df <- data.frame('dt'=c("2013-09-02 09:35:06", "2013-09-02 09:38:24", "2013-09-02 09:38:42", "2013-09-02 09:38:42"), 'v'=c('1','2','3','4'), stringsAsFactors=FALSE) classes=c('POSIXct','numeric') str(tmp.df) #confirm that it has character datatype columns ## 'data.frame': 4 obs. of 2 variables: ## $ dt: chr "2013-09-02 09:35:06" "2013-09-02 09:38:24" "2013-09-02 09:38:42" "2013-09-02 09:38:42" ## $ v : chr "1" "2" "3" "4" ##is the dt column coerceable to POSIXct? canCoerce(tmp.df$dt,"POSIXct") ## [1] FALSE ##and the conver.magic2 function fails also: tmp.df.n <- convert.magic2(tmp.df,classes) ## Error in as(df[, classIndex], classes[classIndex]) : ## no method or default for coercing "character" to "POSIXct" ##ittle reading reveals the setAS function setAs('character', 'POSIXct', function(from){return(as.POSIXct(from))}) ##better answer for canCoerce canCoerce(tmp.df$dt,"POSIXct") ## [1] TRUE ##better answer from conver.magic2 tmp.df.n <- convert.magic2(tmp.df,classes) ##column datatypes converted as I would like them! str(tmp.df.n) ## 'data.frame': 4 obs. of 2 variables: ## $ dt: POSIXct, format: "2013-09-02 09:35:06" "2013-09-02 09:38:24" "2013-09-02 09:38:42" "2013-09-02 09:38:42" ## $ v : num 1 2 3 4 
+2
May 17 '14 at 18:08
source share

Addition to @joran's answer in which convert.magic will not store numeric values ​​in a number-to-number conversion:

 convert.magic <- function(obj,types){ out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i], character = as.character,numeric = as.numeric,factor = as.factor); FUN1(obj[,i])}) names(out) <- colnames(obj) as.data.frame(out,stringsAsFactors = FALSE) } foo<-data.frame(x=c(1:10), y=c("red", "red", "red", "blue", "blue", "blue", "yellow", "yellow", "yellow", "green"), z=Sys.Date()+c(1:10)) foo$x<-as.character(foo$x) foo$y<-as.character(foo$y) foo$z<-as.numeric(foo$z) str(foo) # 'data.frame': 10 obs. of 3 variables: # $ x: chr "1" "2" "3" "4" ... # $ y: chr "red" "red" "red" "blue" ... # $ z: num 16777 16778 16779 16780 16781 ... foo.factors <- convert.magic(foo, rep("factor", 3)) str(foo.factors) # all factors foo.numeric.not.preserved <- convert.magic(foo.factors, c("numeric", "character", "numeric")) str(foo.numeric.not.preserved) # 'data.frame': 10 obs. of 3 variables: # $ x: num 1 3 4 5 6 7 8 9 10 2 # $ y: chr "red" "red" "red" "blue" ... # $ z: num 1 2 3 4 5 6 7 8 9 10 # z comes out as 1 2 3... 

The following should save the numerical values:

 ## as.numeric function that preserves numeric values when converting factor to numeric as.numeric.mod <- function(x) { if(is.factor(x)) as.numeric(levels(x))[x] else as.numeric(x) } ## The same than in @joran answer, except for as.numeric.mod convert.magic <- function(obj,types){ out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i], character = as.character,numeric = as.numeric.mod, factor = as.factor); FUN1(obj[,i])}) names(out) <- colnames(obj) as.data.frame(out,stringsAsFactors = FALSE) } foo.numeric <- convert.magic(foo.factors, c("numeric", "character", "numeric")) str(foo.numeric) # 'data.frame': 10 obs. of 3 variables: # $ x: num 1 2 3 4 5 6 7 8 9 10 # $ y: chr "red" "red" "red" "blue" ... # $ z: num 16777 16778 16779 16780 16781 ... # z comes out with the correct numeric values 
+1
Dec 07 '15 at 8:25
source share

A somewhat simple solution to data.table, although it will take several steps if you go to many different types of columns.

 dt <- data.table( x=c(1:10), y=c(10:20), z=c(10:20), name=letters[1:10]) dt <- dt[, lapply(.SD, as.numeric), by= name] 

This will change all columns except those specified in by to numeric (or whatever you set to lapply )

+1
Jun 05 '16 at 20:28
source share

Like type.convert(foo, as.is = TRUE) , there is also readr::type_convert , which converts the data frame to the appropriate class without specifying them

 readr::type_convert(foo) 



If you leave all columns characters, we can also use readr::parse_guess , which automatically converts the data frame to the correct classes. Consider this modified data frame.

 foo <- data.frame(x = as.character(1:10), y = c("red", "red", "red", "blue", "blue", "blue", "yellow", "yellow", "yellow", "green"), z = as.character(Sys.Date()+c(1:10)), stringsAsFactors = FALSE) str(foo) #'data.frame': 10 obs. of 3 variables: # $ x: chr "1" "2" "3" "4" ... # $ y: chr "red" "red" "red" "blue" ... # $ z: chr "2019-08-12" "2019-08-13" "2019-08-14" "2019-08-15" ... 

Applying parse_guess to each column

 foo[] <- lapply(foo, readr::parse_guess) #'data.frame': 10 obs. of 3 variables: # $ x: num 1 2 3 4 5 6 7 8 9 10 # $ y: chr "red" "red" "red" "blue" ... # $ z: Date, format: "2019-08-12" "2019-08-13" "2019-08-14" "2019-08-15" ... 
+1
Aug 11 '19 at 5:36
source share

Transformation is what you seem to describe:

 foo <- transform(foo, x=as.character(x), y=as.character(y), z=as.numeric(z)) 
0
Dec 19 '17 at 2:53 on
source share

Using purrr and base :

 foo<-data.frame(x=c(1:10), y=c("red", "red", "red", "blue", "blue", "blue", "yellow", "yellow", "yellow", "green"), z=Sys.Date()+c(1:10)) types <- c("character", "character", "numeric") types<-paste0("as.",types) purrr::map2_df(foo,types,function(x,y) do.call(y,list(x))) # A tibble: 10 x 3 xyz <chr> <chr> <dbl> 1 1 red 18127 2 2 red 18128 3 3 red 18129 4 4 blue 18130 
0
Aug 18 '19 at 15:31
source share



All Articles