Delete columns with missing values

Question

Delete columns with missing values

I have a table with many columns and I want to delete columns with more than 500 missing values.

I already know the number of missing values for the column:

library(fields) t(stats(mm))

I got:

  N mean Std.Dev. min Q1 median Q3 max missing values V1 1600 8.67 … 400

Some columns show NA for all characteristics:

  N mean Std.Dev. min Q1 median Q3 max missing values V50 NA NA NA NA NA NA

I also want to remove these columns.

+4

r

Delphine Sep 7 '11 at 8:30

source share

5 answers

Here is one liner to do this mm[colSums(is.na(mm)) > 500]

+8

Ramnath Sep 7 '11 at 17:05

source share

Another potential solution (works especially well with file frames):

data[,!sapply(data,function(x) any(is.na(x)))]

+3

chandler Jan 29 '14 at 19:34

source share

 rem = NULL for(col.nr in 1:dim(data)[2]){ if(sum(is.na(data[, col.nr]) > 500 | all(is.na(data[,col.nr])))){ rem = c(rem, col.nr) } } data[, -rem]

0

pvoosten Sep 7 '11 at 8:47

source share

m is the matrix you are working with. this creates a vector, wntg (means you have to go) that lists columns that have a number of numbers of NA values greater than 500

The terms of this comparison can be easily changed to suit your needs.

Then create a new matrix that I call mr (means abbreviation m), where you removed the columns defined by the vector, wntg

In this simple example, I made the case when you want to exclude columns with more than two NA

wntg <-What (colSums (is.na (m))> 2)

MR <-m [, - with (wntg)]

 > m<-matrix(c(1,2,3,4,NA,NA,7,8,9,NA,NA,NA), nrow=4, ncol =3) > m [,1] [,2] [,3] [1,] 1 NA 9 [2,] 2 NA NA [3,] 3 7 NA [4,] 4 8 NA > wntg<-which(colSums(is.na(m))>2) > wntg [1] 3 > mr<-m[,-c(wntg)] > mr [,1] [,2] [1,] 1 NA [2,] 2 NA [3,] 3 7 [4,] 4 8

0

Flydr Aug 05 '14 at 0:07

source share

Nick sabbe · Accepted Answer · 2011-09-07T08:43:03+0000

If you save the results of calling statistics as follows:

 tmpres<-t(stats(mm))

You can do something like:

 whichcolsneedtogo<-apply(tmpres, 1, function(currow){all(is.na(currow)) || (currow["missing values"] > 500)})

Finally:

 mmclean<-mm[!whichcolsneedtogo]

Of course, this is unchecked because you did not provide data to reproduce your example.

Delete columns with missing values

More articles: