Horrible performance - a simple overhead problem, or is there a software flaw?

Question

Horrible performance - a simple overhead problem, or is there a software flaw?

I have what I understand as a relatively simple OpenMP construct. The problem is that the program runs about 100-300x faster with 1 thread compared to 2 threads. 87% of the program is spent on gomp_send_wait()and another 9.5% on gomp_send_post.

The program gives the correct results, but I'm wondering if there is a flaw in the code that causes some resource conflict, or simply because the overhead of creating the stream does not cost it sharply for cycle 4. It pvaries from 17 to 1000, depending on the size of the molecule which we are modeling.

My numbers refer to the worst case where p is 17 and the block size is 4. The performance is the same, regardless of whether I use static, dynamic or managed scheduling. With p=150and block size, the 75program is still 75x100x slower than the serial one.

...
    double e_t_sum=0.0;
    double e_in_sum=0.0;

    int nthreads,tid;

    #pragma omp parallel for schedule(static, 4) reduction(+ : e_t_sum, e_in_sum) shared(ee_t) private(tid, i, d_x, d_y, d_z, rr,) firstprivate( V_in, t_x, t_y, t_z) lastprivate(nthreads)
    for (i = 0; i < p; i++){
        if (i != c){
            nthreads = omp_get_num_threads();               
            tid = omp_get_thread_num();

            d_x = V_in[i].x - t_x; 
            d_y = V_in[i].y - t_y;
            d_z = V_in[i].z - t_z;


            rr = d_x * d_x + d_y * d_y + d_z * d_z;

            if (i < c){

                ee_t[i][c] = energy(rr, V_in[i].q, V_in[c].q, V_in[i].s, V_in[c].s);
                e_t_sum += ee_t[i][c]; 
                e_in_sum += ee_in[i][c];    
            }
            else{

                ee_t[c][i] = energy(rr, V_in[i].q, V_in[c].q, V_in[i].s, V_in[c].s);
                e_t_sum += ee_t[c][i]; 
                e_in_sum += ee_in[c][i];    
            }

            // if(pid==0){printf("e_t_sum[%d]: %f\n", tid, e_t_sum[tid]);}

        }
    }//end parallel for 


        e_t += e_t_sum;
        e_t -= e_in_sum;            

...

+5

parallel-processing race-condition openmp overhead

Mason e kramer May 15, '09 at 21:30

source share

6 answers

Aater Suleman · Answer 1 · 2011-05-10T00:47:38+0000

First of all, I don’t think that optimizing your serial code in this case will help answer your question in OpenMP dilemna. Do not worry about it.

IMO there are three possible explanations for the slowdown:

. ee_t . , , , , , ( ). , google. ee_t .
, parallelism. 8 ? 2 ?
, , 17. 8 , (, ( == c). , 3 , do 2. , , , , , . , 1 openmp. , , .

, .

Richard Friedman · Answer 2 · 2009-05-17T06:19:27+0000

, , , , . . . . . .

1 : . . (, , OS// .. ) . (, Sun C) "", , , , . (. -xprofile)

- , , . .

, , , . , .

, //.

. OpenMP Solaris

Metiu · Answer 3 · 2009-05-15T21:42:55+0000

, ( ifs) , < c, i > c. , parallelism, , , , n.

RIchard Friedman · Answer 4 · 2009-05-16T18:31:38+0000

. , . . .

, OpenMP . , .

. . .

Novelocrat · Answer 5 · 2009-06-18T02:17:13+0000

-, . , , , .

: GOMP ( ), , , . e_t_sum e_in_sum nthreads e_t_sum[tid] , .

, , , . , , , .

: , ee_t. , . , i > c, , , i < c.

foobar · Answer 6 · 2009-05-22T13:06:15+0000

GNU- openmp. . Intel Linux, .

, , - , , . , V_in, .

I would say that this is one of those two problems that is your problem.

Horrible performance - a simple overhead problem, or is there a software flaw?

More articles: