Defining Data Types of Data Frame Columns

Question

Defining Data Types of Data Frame Columns

I use R and loaded the data into the framework using read.csv() . How to determine the data type for each column in a data frame?

+74

r dataframe

stackoverflowuser2010 Jan 14 '14 at 22:20

source share

6 answers

 sapply(yourdataframe, class)

Where yourdataframe is the name of the data frame to use

+26

Wilmer E. Henao Jan 14 '14 at 22:24

source share

I would suggest

 sapply(foo, typeof)

if you need the actual types of vectors in the data frame. class() is something like another beast.

If you do not need to get this information as a vector (i.e. you do not need to do something else programmatically later), just use str(foo) .

In both cases, foo will be replaced with the name of your data frame.

+9

Gavin Simpson Jan 14 '14 at 22:57

source share

Here is a function that is part of the helpRFunctions package, which will return a list of all the different data types in your data frame, as well as the specific variable names associated with this type.

 install.package('devtools') # Only needed if you dont have this installed. library(devtools) install_github('adam-m-mcelhinney/helpRFunctions') library(helpRFunctions) my.data <- data.frame(y=rnorm(5), x1=c(1:5), x2=c(TRUE, TRUE, FALSE, FALSE, FALSE), X3=letters[1:5]) t <- list.df.var.types(my.data) t$factor t$integer t$logical t$numeric

Then you can do something like var(my.data[t$numeric]) .

Hope this will be helpful!

+2

ML_Dev Nov 25 '14 at 23:25

source share

Just pass your data frame to the following function:

 data_types <- function(frame) { res <- lapply(frame, class) res_frame <- data.frame(unlist(res)) barplot(table(res_frame), main="Data Types", col="steelblue", ylab="Number of Features") }

to plot all data types in your data frame. For the iris dataset, we get the following:

+2

Cybernetic Dec 27 '16 at 23:54

source share

Since this has not been stated clearly, I simply add this:

I was looking for a way to create a table that contains the number of occurrences of all data types .

Say we have a data.frame with two numeric and one logical column

 dta <- data.frame(a = c(1,2,3), b = c(4,5,6), c = c(TRUE, FALSE, TRUE))

You can sum the number of columns of each data type using

 table(unlist(lapply(dta, class))) # logical numeric # 1 2

This is very convenient if you have many columns and want a quick overview.

To give credit: this decision was inspired by @Cybernetic 's answer .

+2

loki Aug 18 '17 at 11:36 on

source share

gung · Accepted Answer · 2014-01-14 22:55

Your best bet is to start using ?str() . To learn some examples, do some data:

 set.seed(3221) # this makes the example exactly reproducible my.data <- data.frame(y=rnorm(5), x1=c(1:5), x2=c(TRUE, TRUE, FALSE, FALSE, FALSE), X3=letters[1:5])

@ Wilmer E Henao H's solution is very optimized:

 sapply(my.data, class) y x1 x2 X3 "numeric" "integer" "logical" "factor"

Using str() , you get this information plus additional positive effects (such as the levels of your factors and the first few values of each variable):

 str(my.data) 'data.frame': 5 obs. of 4 variables: $ y : num 1.03 1.599 -0.818 0.872 -2.682 $ x1: int 1 2 3 4 5 $ x2: logi TRUE TRUE FALSE FALSE FALSE $ X3: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5

Gavin Simpson's approach is also optimized, but provides slightly different information than class() :

 sapply(my.data, typeof) y x1 x2 X3 "double" "integer" "logical" "integer"

For more information on class , typeof and the average mode child, see this excellent SO stream: A comprehensive study of the types of things in R. 'mode' and 'class' and 'typeof' are not enough .

Defining Data Types of Data Frame Columns

More articles: