In R: Replacing the value of a column of a data frame with the value of another data frame when the condition is matched between the condition

I have two data frames:

set.seed(343)
testDF <- data.frame(Score = sample(50, size=50, replace=TRUE), number = rep(letters[1:25],2), Rev = rep(0,50))
sourceDF <- data.frame(min = c(1,10,20,30,40), max = c(9, 19, 29, 39, 50), rev = 1:5)

For each line of testDF, where testDF $ score is between sourceDF $ min and sourceDF $ max of sourceDF, replace the value of testDF $ Rev with the corresponding sourceDF $ rev.

I have work with two for loops and an if condition, but this is ... slow (my dataset has about 1 million rows). I tried using findInterval without success.

Is there a better / more efficient way to do this?

+4
source share
2 answers

First, see my comment on how to improve your question and make it reproducible. Secondly, an approach is possible here with which a quick match is performed usingdata.table::foverlaps

library(data.table)
setkey(setDT(testDF)[, Score2 := Score], Score, Score2) # create bounds and key
setkey(setDT(sourceDF), min, max) # Key by min, max
indx <- foverlaps(sourceDF, testDF, nomatch = 0L, which = TRUE) # run foverlaps
testDF[indx$yid,  Rev := sourceDF[indx$xid, rev]] # Update in place by corresponding values
+5

. , , . ... @David foverlaps ( ).

, . (5 ), tesdDF 5 , ( dplyr) Score.

testDF1 <- filter(testDF, Score>=1 & Score <=9) ## First DF

Rev .

testDF1$Rev <- sourceDF$rev[1]

1 1h35mn 800k + .

0

All Articles