Is it useful to transpose an array to use column operations?

Suppose we are working with a language that stores arrays in column order. Suppose also that we have a function that uses a 2-D array as an argument and returns it. I'm wondering if you can argue that it is (or not quite) beneficial to wrap this array when calling a function to work with columns, rather than using string operations, or does transposition negate the benefits of column operations?

As an example, in R, I have an object of class ts named y , which has dimension nxp , i.e. I have p series of times of length n .

I need to do some calculations with y in Fortran, where I have two loops with the following structure:

 do i = 1, n do j= 1, p !just an example, some row-wise operations on `y` x(i,j) = a*y(i,j) D = ddot(m,y(i,1:p),1,b,1) ! ... end do end do 

Since Fortran (like R) uses column- pxn storage, it would be better to do calculations using the pxn array. Therefore, instead of

 out<-.Fortran("something",y=array(y,dim(y)),x=array(0,dim(y))) ynew<-out$out$y x<-out$out$x 

I could use

 out<-.Fortran("something2",y=t(array(y,dim(y))),x=array(0,dim(y)[2:1])) ynew<-t(out$out$y) x<-t(out$out$x) 

where the fortran something2 routine will be something like

 do i = 1, n do j= 1, p !just an example, some column-wise operations on `y` x(j,i) = a*y(j,i) D = ddot(m,y(1:p,i),1,b,1) ! ... end do end do 

Does the choice of an approach always depend on the sizes n and p , or can one say that one approach is better in terms of computational speed and / or memory requirements? In my application, n usually much larger than p , which in most cases is from 1 to 10.

+4
source share
1 answer

more comment, buy, I wanted to put some code: in old school f77 you essentially would have to use the second approach as

 y(1:p,i) 

is just a pointer to y (1, i), with the following p values ​​adjacent in memory.

first construction

 y(i,1:p) 

- This is a list of values ​​stored in memory, so it is required that a copy of the data be passed to the subroutine. I say, it seems, because I do not have the most vague idea about how the modern optimizing compiler deals with these things. I am inclined to think, at best, that in the worst case this can be very harmful. Imagine the array is so large that you need to go to the page to access the entire vector.

After all, the only way to answer this is to test it yourself.

---------- edit did a little testng and confirmed my guess: passing the rows y(i,1:p) really costs you against passing the columns y(1:p,i) . I used a routine that sees almost nothing to see the difference. My guess with any real hit subroutine is negligible.

Btw (and perhaps this helps to understand what is happening), passing each other value in a column

y(1:p:2,i) takes more time (orders) than the transfer of the entire column, while any other value in the row reduces the time by half compared with the transfer of the whole row.

(using gfortran 12 ..)

+3
source

All Articles