I'm really interested in the answer if you find it!
Unfortunately, a lot of things can change the numerical results ...
To ensure efficiency, the LAPACK algorithm iterates over the submatrix blocks. For optimal efficiency, the block size should correspond to some size of CPU L1 / L2 / L3 caches ...
The block size is controlled by the LAPACK ILAENV routine, see http://www.netlib.org/lapack/lug/node120.html
Of course, if the block size is different, the result will be quantitatively different ... It is possible that the lapack / BLAS DLL provided by Matlab was compiled with another custom version of ILAENV on two machines, or that ILAENV was replaced with an individual optimized version that takes into account cache size , you yourself can check out a small C program that calls ILAENV and associates it with the DLL provided by Matlab ...
For basic BLAS, this is even worse: if an optimized version is used, some valid mul-add FPU commands can be used if available, for example, and they are not necessarily available on all FPUs. AFAIK, Matlab use ATLAS http://math-atlas.sourceforge.net/ , and you will need to find out how the liraria was created ... You would need to track the differences as a result of operations with basic algebras (for example, the matrix matrix * or matrix matrix ...).
UPDATE: Even if ILAENVs match, QR uses elementary rotations, so it obviously depends on the implementation of sin / cos. Unfortunately, no standard says exactly how sin and cos should behave bitwise, they can be from several ulp from a rounded exact result and differ from one library to another and will give different results for different architectures / compilers (wired in x87 FPU). Therefore, if you do not provide your own version of these functions (or work in ADA) and compile them with specially developed compiler options and, possibly, precisely control FPU modes where it is not possible to find exactly the same results on different architectures ... also should ask Matlab if they took special care to provide deterministic floating point results when they compiled these libraries.
source share