Interpolation within groups

goal

I want to interpolate within a group in a data frame. This will give me an arbitrary number of intermediate points for each group within the data frame.

Minimal working example

I have a dataframe like:

OldDataFrame <- data.frame(ID = c(1,1,1,2,2,2), time = c(1,2,3,1,2,3), Var1 = c(-0.6 , 0.2, -0.8 , 1.6 , 0.3 , -0.8), Var2 = c(0.5 , 0.7, 0.6 , -0.3 , 1.5 , 0.4) ) 

I want to get something like this:

 TimeInterpolateByGroup <- function(DataFrame, GroupingVariable, TimeVariable, TimeInterval){ #Something Here } 

It would be convenient if I did not need to specify columns for this, and it could work automatically in every numerical column, for example numcolwise in plyr

So that I can apply it like this:

 NewDataFrame = TimeInterpolateByGroup(DataFrame = OldDataFrame, GroupingVariable = "ID", TimeVariable = "time", TimeInterval = 0.25) 

to get a NewDataFrame like:

 NewDataFrame = data.frame(ID = c( 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2 ), time = c( 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3 ), Var1 = c( -0.6, -0.4, -0.2, 0, 0.2, -0.05, -0.3, -0.55, -0.8, 1.6, 1.275, 0.95, 0.625, 0.3, 0.025, -0.25, -0.525, -0.8 ), Var2 = c( 0.5, 0.55, 0.6, 0.65, 0.7, 0.675, 0.65, 0.625, 0.6, -0.3, 0.15, 0.6, 1.05, 1.5, 1.225, 0.95, 0.675, 0.4 )) 

Or in the form of an image I want:

enter image description here

A related question that doesn't quite work

Interpolate variables on subsets of a data block

  • Using an approach like plyr seems to be in the right direction, but with an intricate example and without the ability to have an arbitrary number of intermediate interpolation points. This is important for an animation application (see below), where I'm not sure how many intermediate time points I will need to get a smooth animation.

Some other answers use a time series approach, but this does not allow segmentation by groups.

I also considered using a longitudinal data packet, but this seems unnecessarily complex for what should be a simple problem.

Desired Application

I want to have a graph of xy Var1 and Var2 with points that are each entry point at time = 1. Then I want to use the animate package to see how the points move with time. To do this smoothly, I need all coordinate sets for intermediate points in time.

+6
source share
2 answers

I am sure that the code below gives the correct answer, except for a tiny level of numerical inaccuracy due to using the approx () function. The basic idea is to use ddply to split and combine data frames and approximately for interpolation.

 library(plyr) # time_interpolate is a helper function for TimeInterpolateByGroup # that operates on each of the groups. In the input to this function, # the GroupingVariable column of the data frame should be single-valued. # The function returns a (probably longer) data frame, with estimated # values for the times specified in the output_times array. time_interpolate <- function(data_frame, GroupingVariable, time_var, output_times) { input_times <- data_frame[, time_var] exclude_vars <- c(time_var, GroupingVariable) value_vars <- setdiff(colnames(data_frame), exclude_vars) output_df <- data.frame(rep(data_frame[1,GroupingVariable], length(output_times)), output_times) colnames(output_df) <- c(GroupingVariable, time_var) for (value_var in value_vars) { output_df[,value_var] <- approx(input_times, data_frame[, value_var], output_times)$y } return(output_df) } # A test for time_interpolate time_interpolate(OldDataFrame[1:3,], "ID" , "time", seq(from=1, to=3, by=0.25)) TimeInterpolateByGroup <- function(DataFrame, GroupingVariable, TimeVariable, TimeInterval){ min_time <- min(DataFrame[, TimeVariable]) max_time <- max(DataFrame[, TimeVariable]) output_times <- seq(from=min_time, to=max_time, by=TimeInterval) ddply(DataFrame, GroupingVariable, time_interpolate, GroupingVariable=GroupingVariable, time_var=TimeVariable, output_times=output_times) } 
+3
source

You can also use na.approx from the zoo package.

 library(zoo) my_fun <- function(DataFrame, GroupingVariable, TimeVariable, TimeInterval){ do.call(rbind, by(DataFrame, DataFrame[ , GroupingVariable], function(dat){ tt <- data.frame(time = seq(from = min(dat[ , TimeVariable]), to = max(dat[ , TimeVariable]), by = TimeInterval)) dat2 <- merge(tt, dat, all.x = TRUE) na.approx(dat2) })) } my_fun(df, "ID", "time", 0.25) 
+3
source

All Articles