It turns out that I cannot use the vector unless -ffast-math or -funsafe-math-optimizations .
The two code snippets I played with are the following:
tsum = 0.0d0 tvec(1:n) = A(i1:i2, ir) do ii = 1,n tsum = tsum + tvec(ii) enddo
and
tsum = sum(A(i1:i2,ir))
and here is the time that I get when I run the first code fragment with various compilation options:
10.62 sec ... None 10.35 sec ... -mtune=native -mavx 7.44 sec ... -mtune-native -mavx -ffast-math 7.49 sec ... -mtune-native -mavx -funsafe-math-optimizations
Finally, with the same optimizations, I can vectorize tsum = sum(A(i1:i2,ir)) to get
7.96 sec ... None 8.41 sec ... -mtune=native -mavx 5.06 sec ... -mtune=native -mavx -ffast-math 4.97 sec ... -mtune=native -mavx -funsafe-math-optimizations
When we compare sum and -mtune=native -mavx with -mtune=native -mavx -funsafe-math-optimizations , it shows an acceleration of 70%. (Please note that they only started once - before we publish, we will perform real benchmarking on several runs).
However, I am doing a little punch. My values ββchange a bit when using the -f options. Without them, errors for my variables ( v1 , v2 ):
v1 ... 5.60663e-15 9.71445e-17 1.05471e-15 v2 ... 5.11674e-14 1.79301e-14 2.58127e-15
but with optimizations, errors:
v1 ... 7.11931e-15 5.39846e-15 3.33067e-16 v2 ... 1.97273e-13 6.98608e-14 2.17742e-14
which indicates that something else is really happening.