Good day to all!
I am modeling molecular dynamics, and recently I started trying to implement it in parallel. At first glance, everything looked quite simple: write #pragma omp parallel for the directive before the longest cycles. But, as it happens, the functions in these cycles work on arrays or, more precisely, on arrays that belong to an object of my class that contains all the information about the particle system and the functions that work in this system, so when I added that # pragma before one of the longest cycles, the computation time actually increased several times, despite the fact that my 2-core 4-processor thread was fully loaded.
To understand this, I wrote another, simpler program. This test program performs two identical loops, one in parallel and one in sequential order. The time required to complete both of these cycles is measured. The results surprised me: whenever the first cycle was calculated in parallel, its calculation time decreased compared to the sequential one (1500 and 6000 ms, respectively), but the second cycle calculation time increased sharply (15 000 versus 6000 in sequential order).
I tried to use private () and firstprivate () sentences, but the results were the same. Should each variable defined and initialized before the parallel region be shared automatically? The calculation time of the second cycle returns to normal if it is executed on another vector: vec2, but creating a new vector for each iteration is obviously not an option. I also tried putting the actual vec1 update in the #pragma omp critical area, but that wasn't good either. It did not help to add the Shared clause (vec1).
I would appreciate if you could point out my mistakes and show the correct path.
Do I need to embed this personal (i) in the code?
Here is the test program:
#include "stdafx.h" #include <omp.h> #include <array> #include <time.h> #include <vector> #include <iostream> #include <Windows.h> using namespace std; #define N1 1000 #define N2 4000 #define dim 1000 int main(){ vector<int>res1,res2; vector<double>vec1(dim),vec2(N1); clock_t t, tt; int k=0; for( k = 0; k<dim; k++){ vec1[k]=1; } t = clock(); #pragma omp parallel { double temp; int i,j,k; #pragma omp for private(i) for( i = 0; i<N1; i++){ for(j = 0; j<N2; j++){ for( k = 0; k<dim; k++){ temp+= j; } } vec1[i]+=temp; temp = 0; } } tt = clock(); cout<<tt-t<<endl; for(int k = 0; k<dim; k++){ vec1[k]=1; } t = clock(); for(int g = 0; g<N1; g++){ for(int h = 0; h<N2; h++){ for(int y = 0; y<dim; y++){ vec1[g]+=h; } } } tt = clock(); cout<<tt-t<<endl; getchar(); }
Thank you for your time!
PS I am using visual studio 2012, my processor is Intel Core i3-2370M. My build file consists of two parts:
http://pastebin.com/suXn35xj
http://pastebin.com/EJAVabhF