Access to matrix elements by a series of columns and columns

The matrix A[i][j] . If we want to add matrix elements, which method is better and why?

  • column wise
  • row wise

From my point of view, a reasonable number is better, because in the presentation elements the arrays are stored in adjacent memory cells, therefore access to them takes less time. But since sampling of each place in RAM takes equal time, does it matter?

+8
c arrays
source share
3 answers

Take advantage of spatial locality

In C, matrices are stored in r ow-major order . Therefore, if you access the element a[i][j] , access to the element a[i][j+1] will most likely go to the cache. There will be no access to the main memory. The cache is faster than main memory, so the access pattern matters.

Of course, you need to consider more factors, such as write / read access, write policy (write, write back / write, lack of write allocation), multi-level caches, etc. But that seems redundant for this question.

Get to know a profiling tool like cachegrind and see for yourself.

For example, consider a dummy program that has 4MB matrices. Check the differences between the skip values ​​for each access pattern.

column access

 $ cat col_major.c #include <stdio.h> int main(){ size_t i,j; const size_t dim = 1024 ; int matrix [dim][dim]; for (i=0;i< dim; i++){ for (j=0;j <dim;j++){ matrix[j][i]= i*j; } } return 0; } $ valgrind --tool=cachegrind ./col_major ==3228== D refs: 10,548,665 (9,482,149 rd + 1,066,516 wr) ==3228== D1 misses: 1,049,704 ( 968 rd + 1,048,736 wr) ==3228== L2d misses: 1,049,623 ( 893 rd + 1,048,730 wr) ==3228== D1 miss rate: 9.9% ( 0.0% + 98.3% ) ==3228== L2d miss rate: 9.9% ( 0.0% + 98.3% ) 

line access

 $ cat row_major.c #include <stdio.h> int main(){ size_t i,j; const size_t dim = 1024 ; int matrix [dim][dim]; for (i=0;i< dim; i++) for (j=0;j <dim;j++) matrix[i][j]= i*j; return 0; } $ valgrind --tool=cachegrind ./row_major ==3524== D refs: 10,548,665 (9,482,149 rd + 1,066,516 wr) ==3524== D1 misses: 66,664 ( 968 rd + 65,696 wr) ==3524== L2d misses: 66,583 ( 893 rd + 65,690 wr) ==3524== D1 miss rate: 0.6% ( 0.0% + 6.1% ) ==3524== L2d miss rate: 0.6% ( 0.0% + 6.1% ) 
+24
source share

If arrays are small, this is not important. If they are large, then the reading time may be affected. The big problem is the cache. If you cannot expect your full matrix to be loaded into the cache right away, then you want to minimize the number of cache misses that you encounter, since working with a cache miss is relatively time-consuming.

If the arrays are really large, then you can achieve even greater results by causing more page sharing than necessary.

+2
source share

For C, the best way to handle multidimensional arrays is:

 int a[MAX_I][MAX_J]; for (i = 0; i < MAX_I; ++i) { for (j = 0; j < MAX_J; ++j) { /* process a[i][j] */ } } 

The reason for this is that the C language treats arrays as pointers with offsets, see: C. programming language .

0
source share

Source: https://habr.com/ru/post/651176/


All Articles