A subset of rows containing NA values (missing) in the selected column of the data frame

Question

A subset of rows containing NA values (missing) in the selected column of the data frame

We have a data frame from a CSV file. The DF data frame contains columns containing the observed values and a column ( VaR2 ) that contains the date on which the measurement was performed. If the date was not recorded, the CSV file contains the value NA , for missing data.

 Var1 Var2 10 2010/01/01 20 NA 30 2010/03/01

We would like to use the subset command to define a new data frame new_DF so that it contains only rows with the value NA' from the column ( VaR2 ). In the above example, the new DF will only contain line 2.

Team

 new_DF<-subset(DF,DF$Var2=="NA")

does not work, the resulting data frame has no entries in the row.

If the value of NA exchanged with NULL in the source CSV file, the same command gives the desired result: new_DF<-subset(DF,DF$Var2=="NULL") .

How can I make this method work if for a character string the value NA specified in the source CSV file?

+51

r csv dataframe na subset

John Nov 02 2018-11-12T00:

source share

5 answers

Joris Meys · Answer 1 · 2011-11-02 13:02

Never use == 'NA' to check for missing values. Use is.na() . This should do it:

 new_DF <- DF[rowSums(is.na(DF)) > 0,]

or if you want to check a specific column, you can also use

 new_DF <- DF[is.na(DF$Var),]

If you have NA character values, first run

 Df[Df=='NA'] <- NA

to replace them with missing values.

maressyl · Answer 2 · 2011-11-02 13:32

NA is a special value in R; do not mix the value of NA with the string "NA". Depending on how you import the data, your “NA” and “NULL” cells may have different types (the default behavior is to convert the “NA” strings to NA and the “NULL” strings as they are).

If you use read.table () or read.csv (), you should consider the "na.strings" argument for pure data import and always work with real R NA values.

An example that works in both cases: "NULL" and "NA":

 DF <- read.csv("file.csv", na.strings=c("NA", "NULL")) new_DF <- subset(DF, is.na(DF$Var2))

user3226167 · Answer 3 · 2017-11-03 09:35

complete.cases gives TRUE when all the values in the row are not NA

 DF[!complete.cases(DF), ]

drhnis · Answer 4 · 2017-11-21 23:57

Try changing this:

 new_DF<-dplyr::filter(DF,is.na(Var2))

jstar · Answer 5 · 2016-05-29 06:28

Prints all lines with NA data:

 tmp <- data.frame(c(1,2,3),c(4,NA,5)); tmp[round(which(is.na(tmp))/ncol(tmp)),]

A subset of rows containing NA values ​​(missing) in the selected column of the data frame

More articles:

A subset of rows containing NA values (missing) in the selected column of the data frame