One parameter generates sample() for all Seeds for each df line at a time.
Using set.seed(1) before your loop based code, I get:
> df Density Seeds SeedsOnRoad 1 0 0 0 2 0 0 0 3 0 0 0 4 3 1500 289 5 0 0 0 6 120 60000 12044 7 300 150000 29984 8 120 60000 12079 9 0 0 0 10 0 0 0
I get the same answer in a fraction of the time if I do this:
set.seed(1) tmp <- sapply(df$Seeds, function(x) sum(sample(SeedRainDists, x, replace = TRUE) > 40))) > tmp [1] 0 0 0 289 0 12044 29984 12079 0 0
For comparison:
df <- transform(df, GavSeedsOnRoad = tmp) df > df Density Seeds SeedsOnRoad GavSeedsOnRoad 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 3 1500 289 289 5 0 0 0 0 6 120 60000 12044 12044 7 300 150000 29984 29984 8 120 60000 12079 12079 9 0 0 0 0 10 0 0 0 0
The following points should be noted here:
- try not to call the function again in a loop if you bill the function or you can generate the whole end result in one call. Here you called
sample() Seeds once for each df line, each call returned one sample from SeedRainDists . Here I make one call to sample() to request the Seeds sample size for each df line - so I call sample 10 times, your code is called 271500 times. even if you need to repeatedly call a function in a loop, remove from the loop everything that has been vectorized, which could be done on the whole result after the loop is completed. An example here is your accumulation of SeedsOut , which calls +() large number of times.
It would be better to collect each SeedsOut in a vector, and then sum() this vector outside the loop. For instance.
SeedsOut <- numeric(length = x) for(i in seq_len(x)) { SeedsOut[i] <- ifelse(sample(SeedRainDists,1,replace=TRUE)>40,1,0) } sum(SeedOut)
Note that R treats the boolean as if it were a numeric 0 or 1 , where it was used in any mathematical function. Hence
sum(ifelse(sample(SeedRainDists, 100, replace=TRUE)>40,1,0))
and
sum(sample(SeedRainDists, 100, replace=TRUE)>40)
will give the same result if it works with the same set.seed() .
There may be a more attractive way to perform a selection that requires fewer sample() calls (and there is, sample(SeedRainDists, sum(Seeds), replace = TRUE) > 40 , but then you need to take care of choosing the right elements of this vector for each df line - not difficult, just cumbersome), but what I show can be quite effective?
source share