Why is MATLAB faster than C ++ when creating random numbers?

I have been using MATLAB for some time for my projects, and I almost never had any experience with C ++.

I needed speed, and I heard that C ++ can be more efficient and faster than MATLAB. So I tried this:

I created a random number matrix using rand (5000,5000) on MATLAB.

And in C ++, I initialized a 2D vector created 2 for loops, each of which loops 5000 times each time. MATLAB was 4-5 times faster, so I thought it was because Matlab executes vectorized codes in parallel, then I wrote C ++ code using parallel_for. Here is the code:

#include "stdafx.h" #include <iostream> #include <vector> #include <fstream> #include <ppl.h> using namespace std; using namespace concurrency; int main(); { int a = 5000, b = 5000, j, k; vector< vector<int> > vec(a, vector<imt>(b)); parallel_for(int(0), a, [&](int i) { for (j = 0; j <b; j++) { vec[i][j] = rand(); } }); } 

Thus, the code is about 25% faster than MATLAB rand(5000,5000) However, C ++ uses 100% of the processor, and MATLAB uses 30% of the CPU.

So I got MATLAB to use the whole processor by starting 3 instances of MATLAB using rand(5000,5000) and dividing the time it takes by 3. It made MATLAB twice as fast as C ++.

I wonder what I'm missing? I know this is a tiny example, but I need an answer to port my code to C ++.

Current state:

When I write C ++ code without parallel_for , I get half the MATLAB speed with the same CPU usage. However, the people who gave the answers say they are almost the same. I do not understand what I am missing

here is a snapshot of the optimization menu enter image description here

+3
source share
3 answers

Perhaps this is not an answer, but a litle hint. The comparison may be a little unfair due to the use of vectors .

Here is the comparison I wrote. Both of them occupy approximately 100% of one of the four available threads. In both cases, I create random numbers 5000x5000 and do it 100 times to select the time

Matlab

 function stackoverflow tic for i=1:100 A =rand(5000); end toc 

Lead time: ~ 27.9 s

C ++

 #include <iostream> #include <stdlib.h> #include <time.h> #include <ctime> using namespace std; int main(){ int N = 5000; double ** A = new double*[N]; for (int i=0;i<N;i++) A[i] = new double[N]; srand(time(NULL)); clock_t start = clock(); for (int k=0;k<100;k++){ for (int i=0;i<N;i++){ for (int j=0;j<N;j++){ A[i][j] = rand(); } } } cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl; } 

Lead time: ~ 28.7 s

Thus, both examples run almost equally fast.

+2
source

After looking at @sonystarmap's answer, I added several types of containers: double* , vector<double> and vector<vector<double> > . I also added tests where the "pointer containers" are memset, since vector initializes all the memory.

C ++ code was compiled using this optimization flag: -O3 -march=native

Results:

Matlab: Elapsed time - 28.457788 seconds.

C ++:

T = 23844.2ms

T = 25161.5 ms

T = 25154 ms

T = 24197.3 ms

T = 24235.2 ms

T = 24166.1ms

I essentially can't find the big win you mentioned.

 #include <iostream> #include <stdlib.h> #include <time.h> #include <ctime> #include <vector> #include <cstring> using namespace std; int main(){ const int N = 5000; { vector<double> A(N*N); srand(0); clock_t start = clock(); for (int k=0;k<100;k++){ for (int i=0;i<N;i++){ for (int j=0;j<N;j++){ A[i*N+j] = rand(); } } } cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl; } { vector<vector<double> > A(N); for (int i=0;i<N;i++) A[i] = vector<double>(N); srand(0); clock_t start = clock(); for (int k=0;k<100;k++){ for (int i=0;i<N;i++){ for (int j=0;j<N;j++){ A[i][j] = rand(); } } } cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl; } { double ** A = new double*[N]; for (int i=0;i<N;i++) A[i] = new double[N]; srand(0); clock_t start = clock(); for (int k=0;k<100;k++){ for (int i=0;i<N;i++){ for (int j=0;j<N;j++){ A[i][j] = rand(); } } } cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl; } { double ** A = new double*[N]; for (int i=0;i<N;i++) { A[i] = new double[N]; memset(A[i], 0, sizeof(double) * N); } srand(0); clock_t start = clock(); for (int k=0;k<100;k++){ for (int i=0;i<N;i++){ for (int j=0;j<N;j++){ A[i][j] = rand(); } } } cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl; } { double * A = new double[N * N]; srand(0); clock_t start = clock(); for (int k=0;k<100;k++){ for (int i=0;i<N;i++){ for (int j=0;j<N;j++){ A[i*N + j] = rand(); } } } cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl; } { double * A = new double[N * N]; memset(A, 0, sizeof(double) * N * N); srand(0); clock_t start = clock(); for (int k=0;k<100;k++){ for (int i=0;i<N;i++){ for (int j=0;j<N;j++){ A[i*N + j] = rand(); } } } cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl; } } 
+1
source
 #include <vector> #include <iostream> #include <cstdlib> #include <ctime> #include <cstring> int main() { const int N = 5000; std::vector<int> A(N*N); srand(0); clock_t start = clock(); for(int k = 0; k < 100; ++k){ for(int i = 0; i < N * N; ++i) { A[i] = rand(); } } std::cout << (clock()-start)/(double)(CLOCKS_PER_SEC/1000) << "ms" << "\n"; return 0; } 

Passed from 25-27 seconds on my workstation without the optimization flag on the compiler to 21 seconds with

-O3 -g -Wall -ftree-vectorizer-verbose = 5 -msse -msse2 -msse3 -march = native -mtune = native -ffast-math

0
source

All Articles