Delete columns with missing values

I have a table with many columns and I want to delete columns with more than 500 missing values.

I already know the number of missing values ​​for the column:

library(fields) t(stats(mm)) 

I got:

  N mean Std.Dev. min Q1 median Q3 max missing values V1 1600 8.67 … 400 

Some columns show NA for all characteristics:

  N mean Std.Dev. min Q1 median Q3 max missing values V50 NA NA NA NA NA NA 

I also want to remove these columns.

+4
source share
5 answers

If you save the results of calling statistics as follows:

 tmpres<-t(stats(mm)) 

You can do something like:

 whichcolsneedtogo<-apply(tmpres, 1, function(currow){all(is.na(currow)) || (currow["missing values"] > 500)}) 

Finally:

 mmclean<-mm[!whichcolsneedtogo] 

Of course, this is unchecked because you did not provide data to reproduce your example.

+5
source

Here is one liner to do this mm[colSums(is.na(mm)) > 500]

+8
source

Another potential solution (works especially well with file frames):

data[,!sapply(data,function(x) any(is.na(x)))]

+3
source
 rem = NULL for(col.nr in 1:dim(data)[2]){ if(sum(is.na(data[, col.nr]) > 500 | all(is.na(data[,col.nr])))){ rem = c(rem, col.nr) } } data[, -rem] 
0
source

m is the matrix you are working with. this creates a vector, wntg (means you have to go) that lists columns that have a number of numbers of NA values ​​greater than 500

The terms of this comparison can be easily changed to suit your needs.

Then create a new matrix that I call mr (means abbreviation m), where you removed the columns defined by the vector, wntg

In this simple example, I made the case when you want to exclude columns with more than two NA

wntg <-What (colSums (is.na (m))> 2)

MR <-m [, - with (wntg)]

 > m<-matrix(c(1,2,3,4,NA,NA,7,8,9,NA,NA,NA), nrow=4, ncol =3) > m [,1] [,2] [,3] [1,] 1 NA 9 [2,] 2 NA NA [3,] 3 7 NA [4,] 4 8 NA > wntg<-which(colSums(is.na(m))>2) > wntg [1] 3 > mr<-m[,-c(wntg)] > mr [,1] [,2] [1,] 1 NA [2,] 2 NA [3,] 3 7 [4,] 4 8 
0
source

All Articles