A simple test for measuring cache line sizes

Starting from this article - Igor Ostrovsky's processor cache effects gallery - I wanted to play with his examples on my machine. This is my code for the first example that examines how touching different cache lines affects runtime:

#include <iostream> #include <time.h> using namespace std; int main(int argc, char* argv[]) { int step = 1; const int length = 64 * 1024 * 1024; int* arr = new int[length]; timespec t0, t1; clock_gettime(CLOCK_REALTIME, &t0); for (int i = 0; i < length; i += step) arr[i] *= 3; clock_gettime(CLOCK_REALTIME, &t1); long int duration = (t1.tv_nsec - t0.tv_nsec); if (duration < 0) duration = 1000000000 + duration; cout<< step << ", " << duration / 1000 << endl; return 0; } 

Using different values ​​for the step, I do not see a run-time jump:

 step, microseconds 1, 451725 2, 334981 3, 287679 4, 261813 5, 254265 6, 246077 16, 215035 32, 207410 64, 202526 128, 197089 256, 195154 

I would expect to see something similar:

But starting at 16, the operating time is halved every time we double the step.

I am testing it on Ubuntu13, Xeon X5450 and compiling it with: g ++ -O0. Is something wrong with my code, or are the results really ok? Any understanding of what I am missing will be greatly appreciated.

+5
source share
2 answers

As I can see, you want to observe the effect of cache line sizes, I recommend tool cachegrind, part of the valgrind toolkit. Your approach is correct, but not close to the results.

 #include <iostream> #include <time.h> #include <stdlib.h> using namespace std; int main(int argc, char* argv[]) { int step = atoi(argv[1]); const int length = 64 * 1024 * 1024; int* arr = new int[length]; for (int i = 0; i < length; i += step) arr[i] *= 3; return 0; } 

Run the valgrind tool --tool = cachegrind./a.out $ cacheline-size and you should see the results. After that, you will get the desired results with accuracy. Happy experiment !!

0
source
 public class CacheLine { public static void main(String[] args) { CacheLine cacheLine = new CacheLine(); cacheLine.startTesting(); } private void startTesting() { byte[] array = new byte[128 * 1024]; for (int testIndex = 0; testIndex < 10; testIndex++) { testMethod(array); System.out.println("--------- // ---------"); } } private void testMethod(byte[] array) { for (int len = 8192; len <= array.length; len += 8192) { long t0 = System.nanoTime(); for (int i = 0; i < 10000; i++) { for (int k = 0; k < len; k += 64) { array[k] = 1; } } long dT = System.nanoTime() - t0; System.out.println("len: " + len / 1024 + " dT: " + dT + " dT/stepCount: " + (dT) / len); } } } 

This code will help you determine the size of the L1 data cache. You can read more about this. https://medium.com/@behzodbekqodirov/threading-in-java-194b7db6c1de#.kzt4w8eul

0
source

All Articles