It is true that na.omit and complete.cases functionally the same when complete.cases is applied to all columns of your object (e.g. data.frame ):
R> all.equal(na.omit(mydf),mydf[complete.cases(mydf),],check.attributes=F) [1] TRUE
But I see two fundamental differences between these two functions (there may well be additional differences). First, na.omit adds the na.action attribute to the object, providing information on how the data was changed by WRT missing values. I imagine a trivial usage example:
foo <- function(data) { data <- na.omit(data) n <- length(attributes(na.omit(data))$row.names) message(sprintf("Note: %i rows removed due to missing values.",n))
where we provide the user with some relevant information. I'm sure a more creative person could (and probably) find (find) a better use of the na.action attribute, but you get the point.
Secondly, complete.cases allows partial manipulation of missing values, for example.
R> mydf[complete.cases(mydf[,1]),] AA BB 1 2 2 3 6 8 4 5 NA 5 9 6
Depending on what your variables represent, you may feel comfortable imputing values ββfor column BB , but not for column AA , so using complete.cases like this allows you finer control.
source share