When to use na.omit compared to complete.cases

Question

When to use na.omit compared to complete.cases

I have the following code comparing na.omit and complete.cases:

> mydf AA BB 1 2 2 2 NA 5 3 6 8 4 5 NA 5 9 6 6 NA 1 > > > na.omit(mydf) AA BB 1 2 2 3 6 8 5 9 6 > > mydf[complete.cases(mydf),] AA BB 1 2 2 3 6 8 5 9 6 > > str(na.omit(mydf)) 'data.frame': 3 obs. of 2 variables: $ AA: int 2 6 9 $ BB: int 2 8 6 - attr(*, "na.action")=Class 'omit' Named int [1:3] 2 4 6 .. ..- attr(*, "names")= chr [1:3] "2" "4" "6" > > > str(mydf[complete.cases(mydf),]) 'data.frame': 3 obs. of 2 variables: $ AA: int 2 6 9 $ BB: int 2 8 6 > > identical(na.omit(mydf), mydf[complete.cases(mydf),]) [1] FALSE

Are there situations where one or the other should be used, or are they practically the same?

+10

r

rnso Apr 6 '15 at 13:46

source share

1 answer

nrussell · Accepted Answer · 2015-04-06T15:28:50+0000

It is true that na.omit and complete.cases functionally the same when complete.cases is applied to all columns of your object (e.g. data.frame ):

 R> all.equal(na.omit(mydf),mydf[complete.cases(mydf),],check.attributes=F) [1] TRUE

But I see two fundamental differences between these two functions (there may well be additional differences). First, na.omit adds the na.action attribute to the object, providing information on how the data was changed by WRT missing values. I imagine a trivial usage example:

 foo <- function(data) { data <- na.omit(data) n <- length(attributes(na.omit(data))$row.names) message(sprintf("Note: %i rows removed due to missing values.",n)) # do something with data } ## R> foo(mydf) Note: 3 rows removed due to missing values.

where we provide the user with some relevant information. I'm sure a more creative person could (and probably) find (find) a better use of the na.action attribute, but you get the point.

Secondly, complete.cases allows partial manipulation of missing values, for example.

 R> mydf[complete.cases(mydf[,1]),] AA BB 1 2 2 3 6 8 4 5 NA 5 9 6

Depending on what your variables represent, you may feel comfortable imputing values for column BB , but not for column AA , so using complete.cases like this allows you finer control.

When to use na.omit compared to complete.cases

More articles: