I would like to combine the rows of the data frame so that the ranges described by the "start" and "end" columns include all the values ββfrom the original dataset. There may be overlays, repeats, and nested ranges. Some ranges may be missing.
Here is an example of the data I would like to collapse:
data = data.frame(rbind( c("Roger", 1, 10), c("Roger", 10, 15), c("Roger", 16, 17), c("Roger", 3, 6), c("Roger", 20, 25), c("Roger", NA, NA), c("Susan", 2, 8))) names(data) = c("name", "start", "end") data$start = as.numeric(as.character(data$start)) data$end = as.numeric(as.character(data$end))
Desired Result:
name start end Roger 1 17 Roger 20 25 Susan 2 8
My attempt is to expand each element in a range for each row. This works, but then I'm not sure how to compress it back. In addition, the complete dataset I'm working with has ~ 30 million rows and very large ranges, so this method is VERY slow.
pb <- txtProgressBar(min = 0, max = length(data$name), style = 3) mylist = list() for(i in 1:length(data$name)){ subdata = data[i,] if(is.na(subdata$start)){ mylist[[i]] = subdata mylist[[i]]$each = NA } if(!is.na(subdata$start)){ sequence = seq(subdata$start, subdata$end) mylist[[i]] = subdata[rep(1, each = length(sequence)),] mylist[[i]]$daily = sequence } setTxtProgressBar(pb, i) } rbindlist(mylist)