I wrote the code used to organize the data sampled at different frequencies, but I made extensive use of for-loops, which significantly slow down the code when using a large data set. I look through my code, find ways to remove for-loops to speed it up, but one of the loops puzzled me.
As an example, suppose that the data was taken at a frequency of 3 Hz, so I get three rows for every second of data. However, variables A, B, and C are sampled at a frequency of 1 Hz each, so I get one value every three lines for each of them. Variables are selected sequentially during one second period, which leads to the diagonal nature of the data.
To complicate matters even further, sometimes a row is lost in the original dataset.
My goal is this: by identifying the rows that I want to keep, I want to move the non-NA values ​​from subsequent rows to the custodian rows. If not for the problem with the lost data, I would always leave a line containing the value for the first variable, but if one of these lines is lost, I will store the next line.
In the example below, the sixth sample and tenth sample are lost.
A <- c(1, NA, NA, 4, NA, 7, NA, NA, NA, NA) B <- c(NA, 2, NA, NA, 5, NA, 8, NA, 11, NA) C <- c(NA, NA, 3, NA, NA, NA, NA, 9, NA, 12) test_df <- data.frame(A = A, B = B, C = C) test_df ABC 1 1 NA NA 2 NA 2 NA 3 NA NA 3 4 4 NA NA 5 NA 5 NA 6 7 NA NA 7 NA 8 NA 8 NA NA 9 9 NA 11 NA 10 NA NA 12 keep_rows <- c(1, 4, 6, 9)
After I moved the values ​​to the lines of the keeper, I will delete the intermediate lines, as a result we get the following:
test_df <- test_df[keep_rows, ] test_df ABC 1 1 2 3 2 4 5 NA 3 7 8 9 4 NA 11 12
In the end, I only need one row for every second of data, and NA values ​​should remain only where the row of source data was lost.
Does anyone have any ideas on how to move data without using a for loop? I would be grateful for any help! Sorry if this question is too verbose; I wanted to make a mistake on the side of too much information, and not on the insufficient.