I get a weird result using global variables. This question was inspired by another question . In the code below, if I change
int ncols = 4096;
to
static int ncols = 4096;
or
const int ncols = 4096;
the code is much faster, and the assembly is much simpler.
//c99 -O3 -Wall -fopenmp foo.c #include <stdlib.h> #include <stdio.h> #include <omp.h> int nrows = 4096; int ncols = 4096; //static int ncols = 4096; char* buff; void func(char* pbuff, int * _nrows, int * _ncols) { for (int i=0; i<*_nrows; i++) { for (int j=0; j<*_ncols; j++) { *pbuff += 1; pbuff++; } } } int main(void) { buff = calloc(ncols*nrows, sizeof*buff); double dtime = -omp_get_wtime(); for(int k=0; k<100; k++) func(buff, &nrows, &ncols); dtime += omp_get_wtime(); printf("time %.16e\n", dtime/100); return 0; }
I get the same result if char* buff is an automatic variable (i.e. not global or static ). I mean:
//c99 -O3 -Wall -fopenmp foo.c #include <stdlib.h> #include <stdio.h> #include <omp.h> int nrows = 4096; int ncols = 4096; void func(char* pbuff, int * _nrows, int * _ncols) { for (int i=0; i<*_nrows; i++) { for (int j=0; j<*_ncols; j++) { *pbuff += 1; pbuff++; } } } int main(void) { char* buff = calloc(ncols*nrows, sizeof*buff); double dtime = -omp_get_wtime(); for(int k=0; k<100; k++) func(buff, &nrows, &ncols); dtime += omp_get_wtime(); printf("time %.16e\n", dtime/100); return 0; }
If I change buff as a short pointer, then performance is fast and does not depend on whether ncols static or constant if buff is automatic. However, when I make a pointer to buff a int* , I observe the same effect as char* .
I thought it might be due to smoothing pointers, so I also tried
void func(int * restrict pbuff, int * restrict _nrows, int * restirct _ncols)
but it didn’t matter.
Here are my questions
- When
buff is either a char* pointer or a global int* pointer, why is this code faster when ncols has file size or is constant? - Why is
buff an automatic variable instead of global or static, makes code faster? - Why doesn't it matter when
buff is a short pointer? - If this is due to overlaying, why does
restrict have no noticeable effect?
Please note that I use omp_get_wtime() simply because it is convenient for synchronization.
c gcc pointers global-variables
Z boson
source share