How to reorder the 1st tusk array in R (don't know common dims)

Question

How to reorder the 1st tusk array in R (don't know common dims)

I have an array whose first dimension I need to multiply / index / reorder. For instance:

arr <- array(1:24, dim=c(4,3,2)) arr[4:1,,]

Simple, works like a charm.

However, is there a way to do this when I'm not sure how many dimensions the array has? To be clear, I will always know the size of this first dimension (that is, I know dim(arr)[1] ), I just don't know length(dim(arr)) .

+4

arrays r indexing order subset

rbatt Aug 14 '15 at 0:07

source share

3 answers

Here is one possible way, although it is still a bit slow.

 do.call(`[`, c(list(arr, 4:1), lapply(dim(arr)[-1], seq_len))) ## , , 1 ## ## [,1] [,2] [,3] ## [1,] 4 8 12 ## [2,] 3 7 11 ## [3,] 2 6 10 ## [4,] 1 5 9 ## ## , , 2 ## ## [,1] [,2] [,3] ## [1,] 16 20 24 ## [2,] 15 19 23 ## [3,] 14 18 22 ## [4,] 13 17 21

do.call requires a list of arguments that (if not named) will be passed to the specified function (in this case [ ) in the order in which they were delivered.

Above, we pass the list(arr, 4:1, 1:3, 1:2) to [ , which is equivalent to doing: `[`(arr, 4:1, 1:3, 1:2) (which, in its the queue is equivalent to arr[4:1, 1:3, 1:2] ).

Timing:

 microbenchmark(subset=arr[4:1,,], jb=do.call(`[`, c(list(arr, 4:1), lapply(dim(arr)[-1], seq_len))), times=1E3) ## Unit: microseconds ## expr min lq mean median uq max neval ## subset 1.140 1.711 1.765575 1.711 1.711 15.395 1000 ## jb 9.693 10.834 11.464768 11.404 11.974 96.365 1000

(Ignore absolute times - my system is currently under tension.)

Thus, it takes about ten times more than a simple subset. There is probably room for improvement, although, as @thelatemail comments, timings are much more comparable across larger arrays.

EDIT

As @thelatemail suggested, index sequences can be replaced with TRUE , which will speed things up a bit.

 do.call(`[`, c(list(arr, 4:1), rep(TRUE, length(dim(arr))-1)))

Dates again:

 microbenchmark(subset=arr[4:1,,], jb=do.call(`[`, c(list(arr, 4:1), lapply(dim(arr)[-1], seq_len))), tlm=do.call(`[`, c(list(arr, 4:1), rep(TRUE, length(dim(arr)) - 1))), times=1E3) ## Unit: microseconds ## expr min lq mean median uq max neval ## subset 1.140 1.711 2.146474 1.711 2.281 124.875 1000 ## jb 10.834 11.974 13.455343 12.545 13.685 293.086 1000 ## tlm 6.272 7.413 8.348592 7.983 8.553 95.795 1000

+3

jbaums Aug 14 '15 at 1:14

source share

I have an ugly and inefficient solution. The problem with a simpler approach is that I don't know how to use the default values [ do.call using do.call . Perhaps someone will see this and be inspired.

Here is the function:

 orderD1 <- function(x, ord){ dims <- dim(x) ndim <- length(dims) stopifnot(ndim>0) if(ndim==1){ return(x[ord]) } wl_i <- which(letters=="i") dimLetters <- letters[wl_i:(wl_i+ndim-1)] dimList <- structure(vector("list",ndim), .Names=dimLetters) dimList[[1]] <- ord for(i in 2:ndim){ dimList[[i]] <- 1:dims[i] } do.call("[",c(list(x=x),dimList)) }

Here is an implementation using an example in the question:

 orderD1(arr, 4:1) , , 1 [,1] [,2] [,3] [1,] 4 8 12 [2,] 3 7 11 [3,] 2 6 10 [4,] 1 5 9 , , 2 [,1] [,2] [,3] [1,] 16 20 24 [2,] 15 19 23 [3,] 14 18 22 [4,] 13 17 21

An example of how slow it is ...

 library(microbenchmark) microbenchmark(arr[4:1,,], orderD1(arr, 4:1), times=1E3) Unit: nanoseconds expr min lq mean median uq max neval arr[4:1, , ] 864 1241 1445.876 1451 1596.0 17191 1000 orderD1(arr, 4:1) 52020 54061 56286.856 54909 56194.5 179363 1000

I am happy to make a more elegant / compact solution.

+2

rbatt Aug 14 '15 at 0:25

source share

bgoldst · Accepted Answer · 2015-08-14T02:13:44+0000

Here's a weird alternative. This idea is based on the implementation quirk, which I noticed at some point that R seems to represent the “missing” function arguments as null characters . One of the reasons why this is so strange is that R usually doesn't allow create characters with null names:

 as.symbol(''); ## Error in as.symbol("") : attempt to use zero-length variable name

But through some mess, I found that you can slip past the R-defense by accessing the expression parsing tree, which includes the “missing” argument and indexes the parsing tree element containing the “missing” argument. Here is a demonstration of some strange the behavior you get from this thing:

 substitute(x[]); ## parse tree involving missing argument ## x[] as.list(substitute(x[])); ## show list representation; third component is the guy ## [[1]] ## `[` ## ## [[2]] ## x ## ## [[3]] ## ## substitute(x[])[[3]]; ## prints nothing! ## (function(x) c(typeof(x),mode(x),class(x)))(substitute(x[])[[3]]); ## it a symbol alright ## [1] "symbol" "name" "name" as.character(substitute(x[])[[3]]); ## gets the name of the symbol: the empty string! ## [1] "" i.dont.exist <- substitute(x[])[[3]]; ## store in variable i.dont.exist; ## wha?? ## Error: argument "i.dont.exist" is missing, with no default

In any case, here you can find a solution for the OP problem:

 arr <- array(1:24,4:2); do.call(`[`,c(list(arr,4:1),rep(list(substitute(x[])[[3]]),length(dim(arr))-1))); ## , , 1 ## ## [,1] [,2] [,3] ## [1,] 4 8 12 ## [2,] 3 7 11 ## [3,] 2 6 10 ## [4,] 1 5 9 ## ## , , 2 ## ## [,1] [,2] [,3] ## [1,] 16 20 24 ## [2,] 15 19 23 ## [3,] 14 18 22 ## [4,] 13 17 21 ##

~~I was hoping this would surpass all other solutions, but @thelatemail, you win this round:~~ Aha! I realized that we can pre-copy the list of the empty character (store the empty character in a variable by itself, i.e. Not in the list, cannot be used, as I showed above) and rep() , which are listed in the solution, and not imposing all the overhead of substitute() on parsing a dummy expression each time the solution is called. And here is the performance:

 straight <- function() arr[4:1,,]; jb <- function() do.call(`[`,c(list(arr,4:1),lapply(dim(arr)[-1],seq_len))); tlm <- function() do.call(`[`,c(list(arr,4:1),rep(TRUE,length(dim(arr))-1))); orderD1 <- function(x,ord) { dims <- dim(x); ndim <- length(dims); stopifnot(ndim>0); if (ndim==1) return(x[ord]); wl_i <- which(letters=="i"); dimLetters <- letters[wl_i:(wl_i+ndim-1)]; dimList <- structure(vector("list",ndim),.Names=dimLetters); dimList[[1]] <- ord; for (i in 2:ndim) dimList[[i]] <- 1:dims[i]; do.call("[",c(list(x=x),dimList)); }; rbatt <- function() orderD1(arr,4:1); bgoldst <- function() do.call(`[`,c(list(arr,4:1),rep(list(substitute(x[])[[3]]),length(dim(arr))-1))); ls0 <- list(substitute(x[])[[3]]); ls0; ## [[1]] ## ## bgoldst2 <- function() do.call(`[`,c(list(arr,4:1),rep(ls0,length(dim(arr))-1))); microbenchmark(straight(),jb(),tlm(),rbatt(),bgoldst(),bgoldst2(),times=1e5); ## Unit: nanoseconds ## expr min lq mean median uq max neval ## straight() 428 856 1161.038 856 1284 998142 1e+05 ## jb() 4277 5988 7136.534 6843 7271 1629357 1e+05 ## tlm() 2566 3850 4622.668 4277 4705 1704196 1e+05 ## rbatt() 24804 28226 31975.583 29509 31219 34970873 1e+05 ## bgoldst() 3421 4705 5601.300 5132 5560 1918878 1e+05 ## bgoldst2() 2566 3850 4533.383 4277 4705 1034065 1e+05

Just discovered that there is an easier way to get an empty character, which seems to have been available all the time:

 substitute(); ##

My trick substitute(x[])[[3]] now looks silly.

Out of curiosity, I compared using substitute() directly with other solutions, and it carries a small cost compared to bgoldst2() , which makes it a little worse than tlm() :

 bgoldst3 <- function() do.call(`[`,c(list(arr,4:1),rep(list(substitute()),length(dim(arr))-1))); microbenchmark(straight(),jb(),tlm(),rbatt(),bgoldst(),bgoldst2(),bgoldst3(),times=1e5); ## Unit: nanoseconds ## expr min lq mean median uq max neval ## straight() 428 856 1069.340 856 1284 850603 1e+05 ## jb() 4277 5988 6916.899 6416 7270 2978180 1e+05 ## tlm() 2566 3849 4307.979 4277 4704 3138122 1e+05 ## rbatt() 24377 28226 30882.666 29508 30364 36768360 1e+05 ## bgoldst() 2994 4704 5165.019 5132 5560 2050171 1e+05 ## bgoldst2() 2566 3849 4232.816 4277 4278 1085813 1e+05 ## bgoldst3() 2566 3850 4545.508 4277 4705 1004131 1e+05

How to reorder the 1st tusk array in R (don't know common dims)

Here is the function:

Here is an implementation using an example in the question:

An example of how slow it is ...

More articles: