There are two main drawbacks to your Fortran implementation:
- You mix IO and calculations (and read from a file record by record).
- You do not use vector / matrix operations.
This implementation performs the same operation as yours and runs 20 times faster on my machine:
program test integer gridsize,unit real mini,maxi,mean real, allocatable :: tmp (:,:,:) gridsize=512 unit=40 allocate( tmp(gridsize, gridsize, gridsize)) open(unit=unit,file='T.out',status='old',access='stream',& form='unformatted',action='read') read(unit=unit) tmp close(unit=unit) mini = minval(tmp) maxi = maxval(tmp) mean = sum(tmp)/gridsize**3 print *, mini, maxi, mean end program
The idea is to read the entire file into one tmp array at a time. Then I can use the MAXVAL , MINVAL and SUM functions directly in the array.
For the accuracy problem: just use double precision values ββand do the conversion on the fly as
mean = sum(real(tmp, kind=kind(1.d0)))/real(gridsize**3, kind=kind(1.d0))
only slightly increases the calculation time. I tried to perform an operation with elements and slices, but this only increased the required time at the optimization level by default.
In -O3 element addition performs 3% better than an array operation. The difference between double and single precision operations is less than 2% on my machine - on average (individual runs deviate much more).
Here is a very fast implementation using LAPACK:
program test integer gridsize,unit, i, j real mini,maxi integer :: t1, t2, rate real, allocatable :: tmp (:,:,:) real, allocatable :: work(:) ! double precision :: mean real :: mean real :: slange call system_clock(count_rate=rate) call system_clock(t1) gridsize=512 unit=40 allocate( tmp(gridsize, gridsize, gridsize), work(gridsize)) open(unit=unit,file='T.out',status='old',access='stream',& form='unformatted',action='read') read(unit=unit) tmp close(unit=unit) mini = minval(tmp) maxi = maxval(tmp) ! mean = sum(tmp)/gridsize**3 ! mean = sum(real(tmp, kind=kind(1.d0)))/real(gridsize**3, kind=kind(1.d0)) mean = 0.d0 do j=1,gridsize do i=1,gridsize mean = mean + slange('1', gridsize, 1, tmp(:,i,j),gridsize, work) enddo !i enddo !j mean = mean / gridsize**3 print *, mini, maxi, mean call system_clock(t2) print *,real(t2-t1)/real(rate) end program
This uses a 1-norm SLANGE single-precision matrix for matrix columns. Runtime is even faster than an approach using single precision functions and does not show the problem of accuracy.