Unexpected performance with global variables

I get a weird result using global variables. This question was inspired by another question . In the code below, if I change

int ncols = 4096; 

to

 static int ncols = 4096; 

or

 const int ncols = 4096; 

the code is much faster, and the assembly is much simpler.

 //c99 -O3 -Wall -fopenmp foo.c #include <stdlib.h> #include <stdio.h> #include <omp.h> int nrows = 4096; int ncols = 4096; //static int ncols = 4096; char* buff; void func(char* pbuff, int * _nrows, int * _ncols) { for (int i=0; i<*_nrows; i++) { for (int j=0; j<*_ncols; j++) { *pbuff += 1; pbuff++; } } } int main(void) { buff = calloc(ncols*nrows, sizeof*buff); double dtime = -omp_get_wtime(); for(int k=0; k<100; k++) func(buff, &nrows, &ncols); dtime += omp_get_wtime(); printf("time %.16e\n", dtime/100); return 0; } 

I get the same result if char* buff is an automatic variable (i.e. not global or static ). I mean:

 //c99 -O3 -Wall -fopenmp foo.c #include <stdlib.h> #include <stdio.h> #include <omp.h> int nrows = 4096; int ncols = 4096; void func(char* pbuff, int * _nrows, int * _ncols) { for (int i=0; i<*_nrows; i++) { for (int j=0; j<*_ncols; j++) { *pbuff += 1; pbuff++; } } } int main(void) { char* buff = calloc(ncols*nrows, sizeof*buff); double dtime = -omp_get_wtime(); for(int k=0; k<100; k++) func(buff, &nrows, &ncols); dtime += omp_get_wtime(); printf("time %.16e\n", dtime/100); return 0; } 

If I change buff as a short pointer, then performance is fast and does not depend on whether ncols static or constant if buff is automatic. However, when I make a pointer to buff a int* , I observe the same effect as char* .

I thought it might be due to smoothing pointers, so I also tried

 void func(int * restrict pbuff, int * restrict _nrows, int * restirct _ncols) 

but it didn’t matter.

Here are my questions

  • When buff is either a char* pointer or a global int* pointer, why is this code faster when ncols has file size or is constant?
  • Why is buff an automatic variable instead of global or static, makes code faster?
  • Why doesn't it matter when buff is a short pointer?
  • If this is due to overlaying, why does restrict have no noticeable effect?

Please note that I use omp_get_wtime() simply because it is convenient for synchronization.

+7
c gcc pointers global-variables
source share
2 answers

Some elements allow, as has been written, GCC to accept various kinds of behavior in terms of optimization; probably the most influential optimization we see is loop vectorization. Consequently,

Why is the code faster?

The code is faster because the hot part of it, the loops in func , have been optimized using automatic vectorization. In the case of qualified ncols with static / const , indeed, GCC emits:

Note: vectors vectors Note: clean loop for vectorization to improve alignment

which is displayed if you enable -fopt-info-loop , -fopt-info-vec or combinations thereof with another -optimized , as it has the same effect.


  1. Why is a buff an automatic variable instead of global or static to make code faster?

In this case, GCC can calculate the number of iterations that are intuitively necessary for applying vectorization. This again relates to storing buf , which is external, unless otherwise specified. All vectorization is immediately skipped, unlike when the buff is local, where it continues and succeeds.

  1. Why doesn't it matter when buff is a short pointer?

Why? func accepts char* , which can change anything.

  1. If this is related to overlaying, why does the restriction have no noticeable effect?

I don’t think because GCC can see that they are not aliases when func : restrict is called.

+2
source share

A const , most likely, will always give fast or equally fast code as a read / write variable, since the compiler knows that the variable will not be changed, which, in turn, allows you to use many optimization options.

Declaring a variable scope for an int or static int file should not have a big impact on performance, because it will still be allocated in the same place: the .data section.

But, as mentioned in the comments, if the variable is global, the compiler may have to assume that some other file (translation unit) can change it and, therefore, block some optimization. I guess this is what happens.

But this should in no way be a problem, since there was never a reason to declare a global variable in period C. Always declare them as static to prevent the variable from being abused for the purpose of spaghetti coding.

In general, I would also question the test results. On Windows, you should use QueryPerformanceCounter and the like. https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx

+1
source share

All Articles