A subset of rows containing NA values ​​(missing) in the selected column of the data frame

We have a data frame from a CSV file. The DF data frame contains columns containing the observed values ​​and a column ( VaR2 ) that contains the date on which the measurement was performed. If the date was not recorded, the CSV file contains the value NA , for missing data.

 Var1 Var2 10 2010/01/01 20 NA 30 2010/03/01 

We would like to use the subset command to define a new data frame new_DF so that it contains only rows with the value NA' from the column ( VaR2 ). In the above example, the new DF will only contain line 2.

Team

 new_DF<-subset(DF,DF$Var2=="NA") 

does not work, the resulting data frame has no entries in the row.

If the value of NA exchanged with NULL in the source CSV file, the same command gives the desired result: new_DF<-subset(DF,DF$Var2=="NULL") .

How can I make this method work if for a character string the value NA specified in the source CSV file?

+51
r csv dataframe na subset
Nov 02 2018-11-12T00:
source share
5 answers

Never use == 'NA' to check for missing values. Use is.na() . This should do it:

 new_DF <- DF[rowSums(is.na(DF)) > 0,] 

or if you want to check a specific column, you can also use

 new_DF <- DF[is.na(DF$Var),] 

If you have NA character values, first run

 Df[Df=='NA'] <- NA 

to replace them with missing values.

+86
Nov 02 2018-11-11T00:
source share
β€” -

NA is a special value in R; do not mix the value of NA with the string "NA". Depending on how you import the data, your β€œNA” and β€œNULL” cells may have different types (the default behavior is to convert the β€œNA” strings to NA and the β€œNULL” strings as they are).

If you use read.table () or read.csv (), you should consider the "na.strings" argument for pure data import and always work with real R NA values.

An example that works in both cases: "NULL" and "NA":

 DF <- read.csv("file.csv", na.strings=c("NA", "NULL")) new_DF <- subset(DF, is.na(DF$Var2)) 
+33
Nov 02 2018-11-11T00:
source share

complete.cases gives TRUE when all the values ​​in the row are not NA

 DF[!complete.cases(DF), ] 
+1
Nov 03 '17 at 9:35
source share

Try changing this:

 new_DF<-dplyr::filter(DF,is.na(Var2)) 
0
Nov 21 '17 at 23:57
source share

Prints all lines with NA data:

 tmp <- data.frame(c(1,2,3),c(4,NA,5)); tmp[round(which(is.na(tmp))/ncol(tmp)),] 
-one
May 29 '16 at 6:28
source share