I do not know how to optimize cache performance at a really low level, thinking about the size or associativity of the cache line. This is not something you can learn in one night. Given that my program will work on many different systems and architectures, I donโt think that itโs the same. However, there may be some steps that I can take to reduce cache misses in general.
Here is a description of my problem:
I have a three-dimensional array of integers representing values โโat points in space, for example [x] [y] [z]. Each dimension is the same size as the cube. From this I need to make another 3D array, where each value in this new array is a function of 7 parameters: the corresponding value in the original three-dimensional array plus 6 indices that โtouchโ it in space. I'm not worried about the edges and corners of the cube right now.
Here is what I mean in C ++ code:
void process3DArray (int input[LENGTH][LENGTH][LENGTH], int output[LENGTH][LENGTH][LENGTH]) { for(int i = 1; i < LENGTH-1; i++) for (int j = 1; j < LENGTH-1; j++) for (int k = 1; k < LENGTH-1; k++) //The for loops start at 1 and stop before LENGTH-1 //or other-wise I'll get out-of-bounds errors //I'm not concerned with the edges and corners of the //3d array "cube" at the moment. { int value = input[i][j][k]; //I am expecting crazy cache misses here: int posX = input[i+1] [j] [k]; int negX = input[i-1] [j] [k]; int posY = input[i] [j+1] [k]; int negY = input[i] [j-1] [k]; int posZ = input[i] [j] [k+1]; int negZ = input[i] [j] [k-1]; output [i][j][k] = process(value, posX, negX, posY, negY, posZ, negZ); } }
However, it seems that if LENGTH is big enough, I get tons of cache misses when I get the parameters for the process . Is there a way to make it easier to use the cache, or is there a better way to represent my data besides a 3D array?
And if you have time to answer these additional questions, do I need to consider the value of LENGTH? I like that LENGTH is 20 versus 100 versus 10,000. Also, would I need to do something else if I used something other than integers, for example, maybe a 64-byte structure?
@ildjarn:
Sorry, I did not think that the code that generates the arrays that I pass to process3DArray matters. But if so, I would like to know why.
int main() { int data[LENGTH][LENGTH][LENGTH]; for(int i = 0; i < LENGTH; i++) for (int j = 0; j < LENGTH; j++) for (int k = 0; k < LENGTH; k++) data[i][j][k] = rand() * (i + j + k); int result[LENGTH][LENGTH][LENGTH]; process3DArray(data, result); }