Multiple levels of parallelism using OpenMP - maybe? Clever? Practical?

I am currently working on a sparse matrix / math / iterative C ++ library environment for a modeling tool that I manage. However, I would prefer to use the existing package, but after careful study, not a single one was found that would be appropriate for our simulator (we looked at flens, this is ++, PetSC, own and several others). The good news is my solvers, and the rare matrix structures are now very efficient and reliable. The bad news is that I am now viewing parallelization using OpenMP, and the learning curve is a little steep.

A valid domain can be divided into subdomains that are combined in a block-diagonal format. Thus, our storage scheme looks like an array of smaller square matrices (blocks []), each of which has a format corresponding to a subdomain (for example, compressed row storage: CRS, compressed diagonal memory: CDS, Dense, etc.), and a background matrix (currently using CRS) that allows for connectivity between subdomains.

The "hot spot" in most (all?) Iterative solvers is the operation of multiplying the vector matrix, and this applies to my library. So I focused on optimizing my MxV routines. For a block diagonal structure, the pseudo-code for M * x = b will be as follows:

b=background_matrix*x
start_index = 1;
end_index = 0;
for(i=1:number of blocks) {
    end_index=start_index+blocks[i].numRows();
    b.range(start_index, end_index) += blocks[i] * x.range(start_index, end_index);
    start_index = end_index+1;
}

background_matrix - (CRS), - , .range .

, ( ) , ( ). 10-15 , 4+ .

, , , MxV ( 1 6 ). CRS, CDS MxV-. , , .

, 4 , 2 . , OpenMP - openmp? parallelism , ? , , ( !)

+5
2

, , , .

openmp for?

. . omp_set_nested(1); - #pragma omp parallel for num_threads(4) #pragma omp parallel for num_threads(2) . 8 ( OMP_THREAD_LIMIT, 8 )

, , . -

#pragma omp parallel sections {
     #pragma omp section 
     do your stuff for the first part, nest parallel region again
     #pragma omp section 
     and so on for the other parts
}

OpenMP 3.0 #pragma omp task.

8 .

, ( , ), . i j (i, j). , , ,

parallelism , ?

, . , parallelism . . paper .

: , . , , , / .

parallelism , , (). , parallelism . , , , 4 .

MxV SIMD. 2-4. .

MxV, tiling, , . book, 11 ( ), , .

+4

All Articles