Performance: Matlab vs C ++ Matrix Vector Multiplication

Question

Performance: Matlab vs C ++ Matrix Vector Multiplication

Preamble

Some time ago I asked a question about Matlab vs Python performance ( Performance: Matlab vs Python ). I was surprised that Matlab is faster than Python, especially in meshgrid. When discussing this issue, I was told that I should use a shell in Python to call my C ++ code, because C ++ code is also available to me. I have the same code in C ++, Matlab and Python.

At the same time, I was again surprised to find that Matlab is faster than C ++ in calculating the matrix and . I have a little larger code from which I am exploring a matrix segment - vector multiplication. Larger code performs such multiplications in multiple instances. In general, C ++ code is much faster than Matlab (since calling a function in Matlab has overhead, etc.), but Matlab seems to be superior to C ++ in matrix-vector multiplication (the code snippet below).

results

The table below compares the time required to assemble the kernel matrix and the time required to multiply the matrix by a vector. The results are compiled for the size of the matrix NxN, which Nvaries from 10,000 to 40,000. This is not so much. But the interesting thing is that Matlab is superior to C ++, the more it Ngets. Matlab is 3.8 - 5.8 times faster than all time. Moreover, it also performs faster both in matrix calculation and .

 ___________________________________________
|N=10,000   Assembly    Computation  Total  |
|MATLAB     0.3387      0.031        0.3697 |
|C++        1.15        0.24         1.4    |
|Times faster                        3.8    |
 ___________________________________________ 
|N=20,000   Assembly    Computation  Total  |
|MATLAB     1.089       0.0977       1.187  |
|C++        5.1         1.03         6.13   |
|Times faster                        5.2    |
 ___________________________________________
|N=40,000   Assembly    Computation  Total  |
|MATLAB     4.31        0.348        4.655  |
|C++        23.25       3.91         27.16  |
|Times faster                        5.8    |
 -------------------------------------------

Question

Is there a faster way to do this in C ++? Am I missing something? I understand that C ++ uses loops for, but I understand that Matlab will also do something similar in meshgrid.

Code snippets

Matlab Code:

%% GET INPUT DATA FROM DATA FILES ------------------------------------------- %
% Read data from input file
Data       = load('Input/input.txt');
location   = Data(:,1:2);           
charges    = Data(:,3:end);         
N          = length(location);      
m          = size(charges,2);       

%% EXACT MATRIX VECTOR PRODUCT ---------------------------------------------- %
kex1=ex1; 
tic
Q = kex1.kernel_2D(location , location);
fprintf('\n Assembly time: %f ', toc);

tic
potential_exact = Q * charges;
fprintf('\n Computation time: %f \n', toc);

Class (using meshgrid):

classdef ex1
    methods 
        function [kernel] = kernel_2D(obj, x,y) 
            [i1,j1] = meshgrid(y(:,1),x(:,1));
            [i2,j2] = meshgrid(y(:,2),x(:,2));
            kernel = sqrt( (i1 - j1) .^ 2 + (i2 - j2) .^2 );
        end
    end       
end

C ++ Code:

EDIT

make :

CC=g++ 
CFLAGS=-c -fopenmp -w -Wall -DNDEBUG -O3 -march=native -ffast-math -ffinite-math-only -I header/ -I /usr/include 
LDFLAGS= -g -fopenmp  
LIB_PATH= 

SOURCESTEXT= src/read_Location_Charges.cpp 
SOURCESF=examples/matvec.cpp
OBJECTSF= $(SOURCESF:.cpp=.o) $(SOURCESTEXT:.cpp=.o)
EXECUTABLEF=./exec/mykernel
mykernel: $(SOURCESF) $(SOURCESTEXT) $(EXECUTABLEF)
$(EXECUTABLEF): $(OBJECTSF)
    $(CC) $(LDFLAGS) $(KERNEL) $(INDEX) $(OBJECTSF) -o $@ $(LIB_PATH)
.cpp.o:
    $(CC) $(CFLAGS) $(KERNEL) $(INDEX) $< -o $@

`

# include"environment.hpp"
using namespace std;
using namespace Eigen;

class ex1 
{
public:
    void kernel_2D(const unsigned long M, double*& x, const unsigned long N,  double*&  y, MatrixXd& kernel)    {   
        kernel  =   MatrixXd::Zero(M,N);
        for(unsigned long i=0;i<M;++i)  {
            for(unsigned long j=0;j<N;++j)  {
                        double X =   (x[0*N+i] - y[0*N+j]) ;
                        double Y =   (x[1*N+i] - y[1*N+j]) ;
                        kernel(i,j) = sqrt((X*X) + (Y*Y));
            }
        }
    }
};

int main()
{
    /* Input ----------------------------------------------------------------------------- */
    unsigned long N = 40000;          unsigned m=1;                   
    double* charges;                  double* location;
    charges =   new double[N * m]();  location =    new double[N * 2]();
    clock_t start;                    clock_t end;
    double exactAssemblyTime;         double exactComputationTime;

    read_Location_Charges ("input/test_input.txt", N, location, m, charges);

    MatrixXd charges_           =   Map<MatrixXd>(charges, N, m);
    MatrixXd Q;
    ex1 Kex1;

    /* Process ------------------------------------------------------------------------ */
    // Matrix assembly
    start = clock();
        Kex1.kernel_2D(N, location, N, location, Q);
    end = clock();
    exactAssemblyTime = double(end-start)/double(CLOCKS_PER_SEC);

    //Computation
    start = clock();
        MatrixXd QH = Q * charges_;
    end = clock();
    exactComputationTime = double(end-start)/double(CLOCKS_PER_SEC);

    cout << endl << "Assembly     time: " << exactAssemblyTime << endl;
    cout << endl << "Computation time: " << exactComputationTime << endl;


    // Clean up
    delete []charges;
    delete []location;
    return 0;
}

+6

c++ performance matrix matlab eigen

Fahd Siddiqui 12 . '17 16:07

1

ggael · Accepted Answer · 2017-10-12T20:25:21+0000

, MatLab Intel MKL , . , Eigen . Eigen (, 3.4) , AVX/FMA, , :

-O3 -DNDEBUG -march=native

charges_ - , a VectorXd, , - , -.

Intel MKL, Eigen , , MatLab .

, , , OpenMP ( -fopenmp ), , , , Eigen:

void kernel_2D(const unsigned long M, double* x, const unsigned long N,  double*  y, MatrixXd& kernel)    {
    kernel.resize(M,N);
    auto x0 = ArrayXd::Map(x,M);
    auto x1 = ArrayXd::Map(x+M,M);
    auto y0 = ArrayXd::Map(y,N);
    auto y1 = ArrayXd::Map(y+N,N);
    #pragma omp parallel for
    for(unsigned long j=0;j<N;++j)
      kernel.col(j) = sqrt((x0-y0(j)).abs2() + (x1-y1(j)).abs2());
}

. (Haswell 4- , 2,6 ) 0,36 N = 20000, 0,24 , 0,6 , , MatLab, , , .

Performance: Matlab vs C ++ Matrix Vector Multiplication

More articles: