Total division by different denominators R

Ok, here is a problem that I would like to solve using an efficient, elegant solution like data.table or dplyr.

Define:

DT = data.table(group=c(rep("A",3),rep("B",5)),value=c(2,9,2,3,4,1,0,3)) time group value 1: 1 A 2 2: 2 A 9 3: 3 A 2 4: 1 B 3 5: 2 B 4 6: 3 B 1 7: 4 B 0 8: 5 B 3 

What I'm trying to get is the sum total of groups of values ​​divisible by their reverse ordering at the point in time when they were observed.

  time group value RESULT 1: 1 A 2 2.000000 2: 2 A 9 10.000000 3: 3 A 2 7.166667 4: 1 B 3 3.000000 5: 2 B 4 5.500000 6: 3 B 1 4.000000 7: 4 B 0 2.583333 8: 5 B 3 4.933333 

In line 5, the result: 4/1 + 3/2 = 5.5 Since group 2 had 2 observations during time 2, the latter is divided by 1 and the previous one by 1. Next, in line 6, the result: 1/1 + 4/2+ 3/3 = 4 Since at time 3, group B had 3 observations, the last is divided by 1, the previous by 2 and still the previous by 3. In line 7, 0/1 + 1/2 + 4/3 + 3/4 = 2.583333 and so Further...

The data is big, so avoiding loops is very important!

+7
r data.table dplyr data-manipulation
source share
3 answers

I would use matrix algebra:

 n_max = DT[, .N, by=group][, max(N)] m = matrix(0, n_max, n_max) m[] = ifelse( col(m) >= row(m), 1 / (col(m) - row(m) + 1 ), m) DT[, res := value %*% m[seq_len(.N), seq_len(.N)], by=group ] group value res 1: A 2 2.000000 2: A 9 10.000000 3: A 2 7.166667 4: B 3 3.000000 5: B 4 5.500000 6: B 1 4.000000 7: B 0 2.583333 8: B 3 4.933333 
+6
source share

You can *apply a sequence of group lengths by making sequences for indexing value and vice versa to divide it by. Using dplyr :

 library(tidyverse) DT %>% group_by(group) %>% mutate(result = sapply(seq(n()), function(x){sum(value[seq(x)] / rev(seq(x)))})) ## Source: local data frame [8 x 3] ## Groups: group [2] ## ## group value result ## <fctr> <dbl> <dbl> ## 1 A 2 2.000000 ## 2 A 9 10.000000 ## 3 A 2 7.166667 ## 4 B 3 3.000000 ## 5 B 4 5.500000 ## 6 B 1 4.000000 ## 7 B 0 2.583333 ## 8 B 3 4.933333 

or using purrr::map_dbl instead of sapply ,

 DT %>% group_by(group) %>% mutate(result = map_dbl(seq(n()), ~sum(value[seq(.x)] / rev(seq(.x))))) 

which returns the same. You can translate the same logic into the R base:

 DT$result <- ave(DT$value, DT$group, FUN = function(v){sapply(seq_along(v), function(x){sum(v[seq(x)] / rev(seq(x)))})}) DT ## group value result ## 1 A 2 2.000000 ## 2 A 9 10.000000 ## 3 A 2 7.166667 ## 4 B 3 3.000000 ## 5 B 4 5.500000 ## 6 B 1 4.000000 ## 7 B 0 2.583333 ## 8 B 3 4.933333 

While I have not tested, these methods should be fast enough for most tasks. I suspect @Frank's answer is probably faster if speed is paramount.

+3
source share

If you have spare memory, you can use the Cartesian join to preallocate the rows, so the operations performed internally are simpler and can take advantage of the optimization of data.table GForce. It may not be faster than other solutions, since it mainly trades in memory to use more optimized code inside.

 > DT[, .SD ][DT, on='group', allow.cartesian=T ][, setnames(.SD, 'i.time', 'groupRow') ][time <= groupRow ][, timeRev := .N:1, .(group, groupRow) ][, res := value / timeRev ][, .(res=sum(res)), .(group, groupRow, i.value) ][, groupRow := NULL ][, setnames(.SD, 'i.value', 'value') ] group value res 1: A 2 2.000 2: A 9 10.000 3: A 2 7.167 4: B 3 3.000 5: B 4 5.500 6: B 1 4.000 7: B 0 2.583 8: B 3 4.933 > 
+2
source share

All Articles