How to speed up Eigen matrix matrix product?

I am studying the simple multiplication of two large matrices using the Eigen library. This multiplication is apparently noticeably slower than Matlab and Python for matrices of the same size.

Is there anything to make Eigen surgery faster?

Problem Details

X: 1000 x 50,000 random matrix

Y: random matrix 50,000 x 300

Experiment Dates (on my last MacBook Pro at the end of 2011)

Using Matlab: X * Y takes ~ 1.3 s

Using Enthought Python: numpy.dot (X, Y) takes ~ 2.2 sec

Using Eigen: X * Y takes ~ 2.7 sec

Own information

You can get the Eigen code (as a MEX function): https://gist.github.com/michaelchughes/4742878

This MEX function is read in two matrices from Matlab and returns their product.

Performing this MEX function without an operation with a matrix product (i.e., just performing an IO) is not expensive, so the IO between the function and Matlab does not explain the big difference in performance. This is clearly a valid matrix product operation.

I am compiling with g ++ with these optimization flags: "-O3 -DNDEBUG"

I use the latest stable Eigen header files (3.1.2).

Any suggestions on how to improve Eigen's performance? Can someone reproduce the gap that I see?

UPDATE The compiler really seems important. The initial Eigen time was done using Apple's version of Xcode g ++: llvm-g ++ - 4.2.

When I use g ++ - 4.7, loaded via MacPorts (same CXXOPTIMFLAGS), I get 2.4 sec instead of 2.7.

Any other compilation suggestions would be better appreciated.

You can also get the C ++ source code for this experiment: https://gist.github.com/michaelchughes/4747789

./MatProdEigen 1000 50,000 300

reports 2.4 seconds under g ++ - 4.7

+5
matlab eigen mex
source share
3 answers

First of all, when comparing performance, be sure to turn off turbo-boost (TB). On my system, using gcc 4.5 from macport and without turbo-boost, I get 3.5s, which corresponds to 8.4 GFLOPS, while the theoretical peak of my 2.3-core i7 is 9.2GFLOPS, so it’s not so bad.

MatLab is based on Intel MKL and, upon seeing reporting performance, clearly uses the multi-threaded version. It is unlikely that a small Eigen library could defeat Intel on its own processor!

Numpy can use any BLAS, Atlas, MKL, OpenBLAS library, its own blanc, etc. I assume that in your case Atlas was used, which is also fast.

Finally, here is how you can improve performance : enable multithreading in Eigen by compiling with -fopenmp. By default, Eigen uses the default value for the number of threads for a thread defined by OpenMP. Unfortunately, this number corresponds to the number of logical cores and not to physical cores, so make sure that hyper-threading is disabled or define the OMP_NUM_THREADS environment variable for the physical number of cores. Here I get 1.25 s (without tuberculosis) and 0.95 with tuberculosis.

+12
source share

The reason Matlab is faster is because it uses Intel MKL. Eigen can also use it (see here ), but of course you need to buy it.

Thus, there are a number of reasons why Eigen may be slower. To compare python vs matlab vs Eigen, you really need to encode three equivalent versions of operations in their respective languages. Also note that Matlab caches the results, so you really need to start with a new Matlab session to make sure its magic is not fooling you.

In addition, Matlab Mex Overhead does not exist . The OP reports that new versions "fix" the problem, but I would be surprised if all the service data was completely cleared.

+2
source share

Eigen does not take advantage of the AVX instructions that Intel introduced with Sandy Bridge architecture. This probably explains most of the performance difference between Eigen and MATLAB. I found a branch that adds AVX support to https://bitbucket.org/benoitsteiner/eigen , but as far as I can tell, it has not yet been merged into Eigen./p>

+2
source share

All Articles