In the following code, we pass two arrays to a subroutine and perform some additional operations inside DO loops. Here we consider three cases when different operations are performed: Case 1 = no operation, cases 2 and 3 = assignment of pointer variables.
!------------------------------------------------------------------------ module mymod implicit none integer, pointer :: n_mod integer :: nloop contains !......................................................... subroutine test_2dim ( a, b, n ) integer :: n real :: a(n,n), b(n,n) integer, pointer :: n_ptr integer i1, i2, iloop n_ptr => n_mod do iloop = 1, nloop do i2 = 1, n do i1 = 1, n b(i1,i2) = a(i1,i2) + b(i1,i2) + iloop !(nothing here) !! Case 1 : gfort => 3.6 sec, ifort => 3.2 sec ! n_ptr = n !! Case 2 : gfort => 15.9 sec, ifort => 6.2 sec ! n_ptr = n_mod !! Case 3 : gfort => 3.6 sec, ifort => 3.5 sec enddo enddo enddo endsubroutine endmodule !------------------------------------------------------------------------ program main use mymod implicit none integer, target :: n real, allocatable :: a(:,:), b(:,:) nloop = 10000 ; n = 1000 allocate( a( n, n ), b( n, n ) ) a = 0.0 ; b = 0.0 n_mod => n call test_2dim ( a, b, n ) print *, a(n,n), b(n,n) !! for check end
Here we note that this pointer is associated with the upper bounds of the DO loop through a modular variable (n_mod). Thus, changing pointer variables inside a loop should affect the behavior of loops. But keep in mind that we do not change the boundaries in practice (just a copy of the variable). gfortran 4.8 and ifort 14.0 with -O3 gave the time specified above. It is noteworthy that case 2 is very slow compared to case 1, despite the fact that the pure calculation seems completely different. I suspected that this could be due to the fact that the compiler cannot determine if the upper bound of the second loop (for i1) has been changed by assigning a pointer, so avoid aggressive optimizations. To test this, I tested the following procedure instead of test_2dim ():
subroutine test_1dim ( a, b, n ) integer :: n real :: a(n * n), b(n * n) integer, pointer :: n_ptr integer iloop, i n_ptr => n_mod do iloop = 1, nloop do i = 1, n * n b( i ) = a( i ) + b( i ) + iloop ! (nothing here) !! Case 1 : gfort => 3.6 sec, ifort => 2.3 sec ! n_ptr = n !! Case 2 : gfort => 15.9 sec, ifort => 6.0 sec ! n_ptr = n_mod !! Case 3 : gfort => 3.6 sec, ifort => 6.1 sec enddo enddo endsubroutine
Here the only difference between test_1dim () and test_2dim () is that 1 or 2-dimensional indexes are available to arrays a and b (essentially not a difference in the size of the calculation). Surprisingly, case 2 also gave a slow result, although there is only one DO loop. Since Fortran DO loops define the upper bound of the loop at the [Ref] input, I expected test_1dim () to not be affected by the purpose of the pointer, although that was not the case. So, is there a reasonable explanation for this behavior? (I hope that I am not mistaken, which leads to a difference in this time.)
My motivation for this question: I have widely used derived types to define multidimensional loops, e.g.
module Grid_mod type Grid_t integer :: N1, N2, N3 endtype .... subroutine some_calc ( vector, grid ) type(Grid_t) :: grid .... do i3 = 1, grid % N3 do i2 = 1, grid % N2 do i1 = 1, grid % N1 (... various operations...) enddo enddo enddo
So far, I have not paid much attention to whether the Grid_t objects are provided with the TARGET or POINTER attribute (provided that it has almost no effect on performance). However, now I think that this can lead to poor performance if the compiler cannot determine if the upper bounds are constant inside loops (although I will never change the bounds in real codes). Therefore, I would appreciate any advice on whether I should be more careful using the TARGET or POINTER attributes for related variables (including the derived type components specified by the mesh object above).
Update
Following the suggestion of @francescalus, I tried to add "intent (in), value" to the dummy argument "n". The result is as follows:
test_1dim(): Case 1: gfort => 3.6 s, ifort => 2.3 s Case 2: gfort => 3.6 s, ifort => 3.1 s Case 3: gfort => 3.6 s, ifort => 3.4 s test_2dim(): Case 1: gfort => 3.7 s, ifort => 3.1 s Case 2: gfort => 3.7 s, ifort => 3.1 s Case 3: gfort => 3.7 s, ifort => 6.4 s
Although ifort gives a somewhat irregular result (6.4 s) for Case 3 in test_2dim (), all other results show essentially better performance. This suggests that processing evaluations by the compiler affects performance (not because of the cost of assigning a pointer). Since it seems important to tell the compiler that the boundaries are constant, I also tried to copy the dummy argument n (here, and not with the “intent (in), value”) to the local variable n_ and use it as the boundaries of the loop:
integer :: n !! dummy argument integer :: n_ !! a local variable ... n_ = n do i2 = 1, n_ do i1 = 1, n_ b(i1,i2) = a(i1,i2) + b(i1,i2) + iloop ...
The result for test_2dim () is as follows:
test_2dim(): Case 1: gfort => 3.6 s, ifort => 3.1 s Case 2: gfort => 15.9 s, ifort => 6.2 s Case 3: gfort => 3.7 s, ifort => 6.4 s
Here, unfortunately (and in contrast to my expectation), Case 2 has not improved at all ... Although copying to local n_ should ensure that n_ is a constant in DO loops, the compiler seems unhappy because the array form is still defined by n , not n_, thereby avoiding aggressive optimization (<- just my hunch).
Update2
Following @innoSPG's suggestion, I also changed n to n_ inside DO loops for Case 2, and then it turned out that the code was as fast as Case 1! In particular, the code
n_ = n do i2 = 1, n_ do i1 = 1, n_ b(i1,i2) = a(i1,i2) + b(i1,i2) + iloop n_ptr = n_ !! Case 2 : gfort => 3.7 sec, ifort => 3.1 sec
But, as the answer suggests, this efficiency can be caused by the fact that the assignment operator is completely excluded by the compiler. Therefore, I believe that I need to consider more practical codes (not too simple) to check the effect of pointers or pointer components on loop optimization ...
(... I apologize for the very long question ...)