Forwards against backward movement

Let me first describe this with the fact that I know that such micro-optimizations are rarely cost-effective. I wonder how the material works. For all caching numbers, etc., I mean Intel's x86-64 i5 processor. Obviously, the numbers will differ for different CPUs.

I often had the impression that moving forward is faster than moving backward. I believe this is due to the fact that the extraction of large amounts of data is performed in direct access - that is, if I read byte 0x128, then the cacheline (assuming a length of 64 bytes) will be read in bytes 0x128-0x191 inclusive. Therefore, if the next byte I wanted to receive was 0x129, it was already in the cache.

However, after reading a little, I now get the impression that it really does not matter? Since alignment of the cache line will select the starting point at the nearest border with 64 divisibles, then if I start with byte 0x127, I will load 0x64-0x127 inclusively and therefore will have data in the cache for my walkback. I will suffer from cachemiss when switching from 0x128 to 0x127, but this is due to the fact that I chose the addresses for this example more than any real consideration.

I know that cache lines are read as 8-byte fragments, and therefore a full load should be loaded before the first operation, if we go back, but I doubt it would make an extremely significant difference.

Maybe someone will clarify if I'm here, and the old me is wrong? I searched full day and still could not get a definitive answer to this.

tl; dr: Is the direction we are leading the array really important? Does it really matter? Has it changed the past? (Until 15 years ago or so)

I checked with the following base code and see the same results back and forth:

#include <windows.h> #include <iostream> // Size of dataset #define SIZE_OF_ARRAY 1024*1024*256 // Are we walking forwards or backwards? #define FORWARDS 1 int main() { // Timer setup LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds; LARGE_INTEGER Frequency; int* intArray = new int[SIZE_OF_ARRAY]; // Memset - shouldn't affect the test because my cache isn't 256MB! memset(intArray, 0, SIZE_OF_ARRAY); // Arbitrary numbers for break points intArray[SIZE_OF_ARRAY - 1] = 55; intArray[0] = 15; int* backwardsPtr = &intArray[SIZE_OF_ARRAY - 1]; QueryPerformanceFrequency(&Frequency); QueryPerformanceCounter(&StartingTime); // Actual code if (FORWARDS) { while (true) { if (*(intArray++) == 55) break; } } else { while (true) { if (*(backwardsPtr--) == 15) break; } } // Cleanup QueryPerformanceCounter(&EndingTime); ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart; ElapsedMicroseconds.QuadPart *= 1000000; ElapsedMicroseconds.QuadPart /= Frequency.QuadPart; std::cout << ElapsedMicroseconds.QuadPart << std::endl; // So I can read the output char a; std::cin >> a; return 0; } 

I apologize for A) the Windows code and B) Hacky. He threw together to test a hypothesis, but does not prove reasoning.

It would be useful to get any information on how the direction of walking can matter not only with the cache, but also with other aspects!

+7
c ++ caching memory
source share
1 answer

As your experimentation shows, there is no difference. Unlike the interface between the processor and the L1 cache, the memory system performs complete caching transactions, not bytes. As @ user657267 pointed out, there are use cases specific to the processor. This may be an advantage forward or backward, but I strongly doubt it. All modern pick-ups determine the direction, but do not assume them. In addition, they also discover a step. They are connected with incredibly complex logic and something easy, since the direction will not be their fall.

Short answer: go in any direction you want and get the same performance for both!

+1
source share

All Articles