In R combine two data frames, fill in the blanks

Say I have these two data frames:

big.table <- data.frame("idx" = 1:100) small.table <- data.frame("idx" = sample(1:100, 10), "color" = sample(colors(),10)) 

I want to combine them as follows:

 merge(small.table, big.table, by = "idx", all.y=TRUE) idx color 1 1 <NA> 2 2 <NA> 3 3 salmon2 4 4 <NA> 5 5 <NA> 6 6 <NA> ... 20 20 <NA> 21 21 <NA> 22 22 blue4 23 23 grey99 24 24 <NA> 25 25 <NA> 26 26 <NA> ... 

Now I need to fill in the values ​​in the "color" column in the table so that all NA are set to the values ​​that precede the table.

NOTES: The problem is with the log file created using a computer program, and not with any standard log format. The line blocks in this log file refer to the "process" that is identified in the first line of the block. I pulled the information into the corresponding lines of the log file, most of which belong to the process, and created a data table containing this information (line number, timestamp, etc.). Now I need to fill in the table with the names of the processes that correspond to each row from the small.table that has a row number.

The rows at the top of the large table may not have a “process” (color in the example above). These lines must remain NA.

As soon as the first “process” begins, each line between this initial and the next process chain refers to the first process. When the second process begins, each line between this starting line of the process and the next starting line of the process refers to the second process. And so on. The process lines are never the same line number as the other lines that I collected in my log file data file.

My plan is to create big.table to be the sequence of all the line numbers of the log and merge a small table into it. Then I can "fill in" the process name and merge a large table into a log file, saving only the log file with everything connected with it.

I am open to other approaches.

+8
merge r
source share
2 answers

It looks like you need na.locf from the zoo package (meaning the last observation is being pushed forward):

 library(zoo) tbl <- merge(small.table, big.table, by = "idx", all.y=TRUE) tbl$color2 <- na.locf(tbl$color,na.rm = FALSE) 
+13
source share

A data.table solution:

 require(data.table) b <- data.table(big.table, key="idx") s <- data.table(small.table, key="idx") s[b, roll=T] # idx color # 1: 1 NA # 2: 2 NA # 3: 3 NA # 4: 4 blue3 # 5: 5 blue3 # 6: 6 blue3 # 7: 7 blue3 # 8: 8 blue3 # 9: 9 blue3 # 10: 10 blue3 # 11: 11 navajowhite1 # 12: 12 navajowhite1 # . . . . 
+8
source share

All Articles