Replace NA with previous and subsequent rows in R

How can I quickly replace NA with its previous and next lines?

name grade 1 A 56 2 B NA 3 C 70 4 D 96 

so that grade B is 63.

+8
replace r na
source share
3 answers

Or you can try na.approx from the zoo package: "Missing values ​​(NA) are replaced by linear interpolation"

 library(zoo) x <- c(56, NA, 70, 96) na.approx(x) # [1] 56 63 70 96 

This also works if you have multiple consecutive NA :

 vals <- c(1, NA, NA, 7, NA, 10) na.approx(vals) # [1] 1.0 3.0 5.0 7.0 8.5 10.0 

na.approx based on the base approx function, which can be used instead:

 vals <- c(1, NA, NA, 7, NA, 10) xout <- seq_along(vals) x <- xout[!is.na(vals)] y <- vals[!is.na(vals)] approx(x = x, y = y, xout = xout)$y # [1] 1.0 3.0 5.0 7.0 8.5 10.0 
+9
source share

Suppose you have data.frame df as follows:

 > df name grade 1 A 56 2 B NA 3 C 70 4 D 96 5 E NA 6 F 95 

Then you can use the following:

 > ind <- which(is.na(df$grade)) > df$grade[ind] <- sapply(ind, function(i) with(df, mean(c(grade[i-1], grade[i+1])))) > df name grade 1 A 56 2 B 63 3 C 70 4 D 96 5 E 95.5 6 F 95 
+8
source share

An alternative solution that uses the median instead of the average is represented by the na.roughfix function of randomForest . As described in the documentation , it works with a data frame or numeric matrix. In particular, for numeric variables, NAs are replaced by column medians. For factor variables, NAs are replaced by the most frequent levels (random bond breaking). If the object does not contain NAs , it is returned unchanged.

Using the same examples as @Henrik,

 library(randomForest) x <- c(56, NA, 70, 96) na.roughfix(x) #[1] 56 70 70 96 

or with a larger matrix:

 y <- matrix(1:50, nrow = 10) y[sample(1:length(y), 4, replace = FALSE)] <- NA y # [,1] [,2] [,3] [,4] [,5] # [1,] 1 11 21 31 41 # [2,] 2 12 22 32 42 # [3,] 3 NA 23 33 NA # [4,] 4 14 24 34 44 # [5,] 5 15 25 35 45 # [6,] 6 16 NA 36 46 # [7,] 7 17 27 37 47 # [8,] 8 18 28 38 48 # [9,] 9 19 29 39 49 # [10,] 10 20 NA 40 50 na.roughfix(y) # [,1] [,2] [,3] [,4] [,5] # [1,] 1 11 21.0 31 41 # [2,] 2 12 22.0 32 42 # [3,] 3 16 23.0 33 46 # [4,] 4 14 24.0 34 44 # [5,] 5 15 25.0 35 45 # [6,] 6 16 24.5 36 46 # [7,] 7 17 27.0 37 47 # [8,] 8 18 28.0 38 48 # [9,] 9 19 29.0 39 49 #[10,] 10 20 24.5 40 50 
0
source share

All Articles