R There is no value replacement function

I have a table with missing values, and I'm trying to write a function that will replace the missing values ​​with a calculation based on the next two nonzero values.

Example:

X Tom 1 4.3 2 5.1 3 NA 4 NA 5 7.4 

For X = 3 , Tom = 5.1 + (7.4-5.1)/2 .

For X = 4 , Tom = (5.1 + (7.4-5.1)/2) + (7.4-5.1)/2

Does this feature already exist? If not, any advice would be greatly appreciated.

+1
r missing-data imputation
source share
3 answers

A more common way to do this (but not equivalent to the question) is to use linear interpolation:

 library(zoo) df <- data.frame(X = 1:5, Tom = c(4.3, 5.1, NA, NA, 7.4)) na.approx(df) 

or spline interpolation:

 na.spline(df) 
+1
source share

Just use a loop in this scenario, other approaches are much more complicated.

 for (i in seq_len(nrow(df)) { if (is.na(df[i, 'Tom'])) df[i, 'Tom'] <- ((tmp <- c(0, df$Tom[!is.na(df$Tom)], 0))[i+1] + tmp[i]) / 2 + tmp[i] } 

Example

 df <- data.frame(X = seq_len(100), Tom = ifelse(runif(100, 0, 1) > 0.5, NA, round(runif(100, 0, 10), 1))) head(df) # X Tom # 1 1 NA # 2 1.4 # 3 3 NA # 4 4 3.9 # 5 5 NA for (i in seq_len(nrow(df))) { if (is.na(df[i, 'Tom'])) df[i, 'Tom'] <- ((tmp <- c(0, df$Tom[!is.na(df$Tom)], 0))[i+1] + tmp[i]) / 2 + tmp[i] } head(df) # X Tom # 1 1 0.70 # 2 2 1.40 # 3 3 4.05 # 4 4 3.90 # 5 5 9.05 
0
source share

In fact, the imputeTS package (I am a developer) offers good solutions for this.

Moving Average Replacement

  na.ma(x, k = 2) 

x - your input object k - moving average window

k of 1 means that you only consider the values ​​before and after k of 2 means that you count 2 values ​​before and 2 values ​​after

This function is probably closest to the required calculation. The only difference is that the imputeTS method does not jump over the NA values. (as required by the starter thread)

But especially for long NA bands, this makes sense. 1, 2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 14, 15, 16 (taking the average of 2 and 14 for NA at position 3 would be nice)

Also Last Observation Forward (as noted in comment 42)

 imputeTS::na.locf(x) 

or Interpolation (also mentioned by G. Grothendieck)

 imputeTS::na.interpolation(x) 

There are also no data replacement options that go a bit in one direction. The following is an introduction to the imputeTS package in R Journal if you are interested.

0
source share

All Articles