Multidimensional array multiplication in R

Question

Multidimensional array multiplication in R

I would like to do some complex multidimensional array multiplication, where I multiply by specific fields of arrays.

Consider this example, where the grouping function (A and B) by some population fields prevails:

# setup data random=runif(4) group.prevalence <- aperm (array(c(random,1-random), dim=c(2,2,2), dimnames=list(age=c("young","old"), gender=c("male","female"), group=c("A","B"))) , c(3,1,2) ) group.prevalence # A + B = 1

Suppose now that I have a population of interest ...

 population <- round(array(runif(4, min=100,max=200) %o% c(1,1*(1+random[1]),1*(1+random[1])^2), dim=c(2,2,3), dimnames=list(age=c("young","old"), gender=c("male","female"), year=c("year1","year2","year3")))) population

... for which I would like to calculate the prevalence of "A" and "B".

A bad solution would be to fill all this in a loop:

 # bad solution grouped.population <- array(NA, dim=c(2,2,2,3), dimnames=list(group=c("A","B"), age=c("young","old"), gender=c("male","female"), year=c("year1","year2","year3"))) for (group in c("A","B")) for(gender in c("male","female")) for (age in c("young","old")) grouped.population[group,age,gender,] <- group.prevalence[group,age,gender] * population[age,gender,]

But I suppose some kind of application might come in handy, perhaps plyr aaply, because the size of the result should be preserved. I tried:

 library(plyr) aaply(population, c(1,2), function(x) x * group.prevalence) # too many dimensions

I welcome any suggestions.

+5

r multidimensional-array

mzuba Sep 01 '16 at 12:25

source share

1 answer

aichao · Answer 1 · 2016-09-01T15:02:47+0000

In your specific case, you can calculate:

 out <- rep(group.prevalence, times=last(dim(population))) * rep(population, each=first(dim(group.prevalence)))

and then you can set the dimensions of this array :

 array(out, dim=c(2,2,2,3), dimnames=list(group=c("A","B"), age=c("young","old"), gender=c("male","female"), year=c("year1","year2","year3")))

The key is to align the sizes of the two arrays using transpose sizes and expansion / replication to fill in the missing dimensions that are in the other array. In general, the procedure is as follows:

Define overlapping dimensions. Here (age,gender) .
For the argument on the left side of the group.prevalence multiplication group.prevalence move the sizes (using aperm ) so that all disjoint sizes (i.e. group ) are the first. Then repeat this array N times (using times ), where N is the size of the disjoint dimensions (i.e. year ) of the right side argument, population .
For the argument on the right side of the population multiplication, move the sizes so that all disjoint sizes (i.e. year ) are the last. Then replicate each element of the array M times (using each ), where M is the size of the disjoint dimensions (i.e. group ) of the left side argument, group.prevalence .
Then just (an array) is multiplied, which is vectorized and fast.
Compatible result sizes are simply the disjoint sizes of the left side argument, followed by the intersecting sizes, followed by the disjoint sizes of the right side (i.e. (group, age, gender, year) ). You can then transfer these measurements as needed at the output to get what you want.

How to check:

 # bad solution grouped.population <- array(NA, dim=c(2,2,2,3), dimnames=list(group=c("A","B"), age=c("young","old"), gender=c("male","female"), year=c("year1","year2","year3"))) for (group in c("A","B")) for(gender in c("male","female")) for (age in c("young","old")) grouped.population[group,age,gender,] <- group.prevalence[group,age,gender] * population[age,gender,] # another approach grouped.population2 <- array(rep(group.prevalence, times=last(dim(population))) * rep(population, each=first(dim(group.prevalence))), dim=c(2,2,2,3), dimnames=list(group=c("A","B"), age=c("young","old"), gender=c("male","female"), year=c("year1","year2","year3"))) # check all.equal(grouped.population,grouped.population2) ##[1] TRUE

Updated with a test:

 library(microbenchmark) f1 <- function(group.prevalence, population) { grouped.population <- array(NA, dim=c(2,2,2,3), dimnames=list(group=c("A","B"), age=c("young","old"), gender=c("male","female"), year=c("year1","year2","year3"))) for (group in c("A","B")) { for(gender in c("male","female")) { for (age in c("young","old")) { grouped.population[group,age,gender,] <- group.prevalence[group,age,gender] * population[age,gender,]}}} } f2 <- function(group.prevalence, population) { grouped.population2 <- array(rep(group.prevalence, times=last(dim(population))) * rep(population, each=first(dim(group.prevalence))), dim=c(2,2,2,3), dimnames=list(group=c("A","B"), age=c("young","old"), gender=c("male","female"), year=c("year1","year2","year3"))) } print(microbenchmark(f1(group.prevalence, population))) ##Unit: microseconds ## expr min lq mean median uq max neval ## f1(group.prevalence, population) 101.473 103.998 149.2562 106.8865 115.372 1185.32 100 print(microbenchmark(f2(group.prevalence, population))) ##Unit: microseconds ## expr min lq mean median uq max neval ## f2(group.prevalence, population) 66.392 67.672 70.19873 68.454 69.4205 173.284 100

I believe that performance will diverge even more as the number of dimensions and size in each dimension increase.

Multidimensional array multiplication in R

More articles: