Is it possible to lock some data in the CPU cache?

I have a problem. I am writing data to an array during a loop. And the fact is that I do this very often. This letter seems to now be a bottleneck in code. Since I assume that this is caused by writing to memory. This array is not very big (smth, like 300 elements). The question is, can this be done like this: store it in the cache and update it in memory only after the while loop completes?

[edit - copied from answer added by Alex]

double* array1 = new double[1000000]; // this array has elements unsigned long* array2 = unsigned long[300]; double varX,t,sum=0; int iter=0,i=0; while(i<=max_steps) { varX+=difX; nm0 = int(varX); if(nm1!=nm0) { array2[iter] = nm0; // if you comment this string application works more then 2 times faster :) nm1=nm0; t = array1[nm0]; // if you comment this string , there is almost no change in time ++iter; } sum+=t; ++i; } 

First of all, I would like to thank you all for the answers. Indeed, it was a bit silly not to put code. So I decided to do it now.

 double* array1 = new double[1000000]; // this array has elements unsigned long* array2 = unsigned long[300]; double varX,t,sum=0; int iter=0,i=0; while(i<=max_steps) { varX+=difX; nm0 = int(varX); if(nm1!=nm0) { array2[iter] = nm0; // if you comment this string application works more then 2 times faster :) nm1=nm0; t = array1[nm0]; // if you comment this string , there is almost no change in time ++iter; } sum+=t; ++i; } 

That's all. It would be nice if someone had any ideas. Thanks again.

Regards Alex

+6
c ++ cpu-cache
source share
12 answers

Not intentionally, no. Among other things, you can’t imagine how big the cache is, so you can’t imagine how it fits. In addition, if the application is allowed to block part of the cache, the consequences for the OS can be devastating for overall system performance. It goes straight to my list of "you cannot do this because you must not do this."

What you can do is improve your link locality - try organizing the loop so that you don't access the elements more than once and try to access them in memory order.

Without additional hints about your application, I don’t think that more specific advice can be given.

+13
source share

The processor usually does not provide fine-grained control of the cache, you are not allowed to choose what is pumped out or to link things into the cache. You have several cache operations on some processors. Just like a bit of information on what you can do: here are some interesting cache-related instructions for newer x86 {-64} processors (Doing this makes a hellish move, but I decided you might be interested )

Prefecth Software Data

A non-temporal prefetchnta instruction that retrieves data in a second level cache, minimizing cache pollution.

Temporary instructions: as follows:

 * prefetcht0 – fetches the data into all cache levels, that is, to the 

Layer 2 cache for the Pentium® 4 processor.

 * prefetcht1 – Identical to prefetcht0 * prefetcht2 – Identical to prefetcht0 

In addition, there is a set of instructions for accessing data in memory, but explicitly tells the processor not to insert data into the cache. They are called timeless instructions. An example of one here: MOVNTI .

You can use non-temporary instructions for each piece of data that you DO NOT want in the cache, in the hope that the rest will always remain in the cache. I don't know if this will actually improve performance, as there are subtle behaviors to know when it comes to cache. It also sounds like it would be relatively painful to do.

+7
source share

If your code does not do something completely different between writing to an array, then most of the array is likely to be stored in the cache.

Unfortunately, there is nothing you can do to influence what is in the cache other than rewriting your algorithm based on the cache. Try to use as little memory as possible between writes to memory: do not use many variables, do not call many other functions, and try to write sequentially to the same area of ​​the array.

+3
source share

I have a problem. I am writing data to an array in a while-loop. And the fact is that I do this very often. This letter seems to now be a bottleneck in code. Since I assume that this is caused by writing to memory. This array is not very big (smth, like 300 elements). The question is, can this be done like this: save it in the cache and update it in memory only after the while loop completes?

You do not need. The only reason it can be pushed out of the cache is because some other data is considered more urgent to be inserted into the cache.

In addition, an array of 300 elements should go into the cache without problems (provided that the size of the element is not too crazy), so most likely your data is already in the cache.

In any case, the most effective solution is probably setting up your code. Use a lot of time series (to tell the compiler that the memory address is not important) rather than constantly writing / reading in an array. Change the order of the code so that the loads are executed once, at the beginning of the cycle and break the chains of dependencies as much as possible.

Manually deploying a loop gives you more flexibility to achieve these goals.

And finally, two obvious tools that you should use rather than guessing about cache behavior:

  • Profiler and cachegrind, if available. A good profiler can tell you a lot of statistics about cache misses, and cachegrind also gives you a lot of information.
  • Us here at Stackoverflow. If you post your loop code and ask how its performance can be improved, I am sure that many of us will find this an interesting task.

But, as others have said, do not guess when working with performance. You need hard data and measurements, not cramps and gut feelings.

+3
source share

I doubt it is possible, at least on a high-level multitasking operating system. You cannot guarantee that your process will not be missed before and will lose the processor. If your process then owns the cache, other processes cannot use it to make their exeucution very slow and complicate the situation. You really do not want to run a modern processor with several GHz without a cache, simply because one application blocked all the others from it.

+2
source share

In this case, array2 will be quite “hot” and will remain in the cache only for this reason. The trick saves array1 from the cache (!). You read it only once, so it makes no sense to cache it. SSE instruction for this MOVNTPD , intrinsic void_mm_stream_pd(double *destination, __m128i source)

+2
source share

Even if you could, this is a bad idea.

Modern desktop computers use multi-core processors. Intel chips are the most common chips on desktop computers ... but Core and Core 2 processors do not use cache memory.

That is, I didn’t share the cache until Core i7 chips were released, which shared the 8 MB L3 cache.

So, if you could lock the data in the cache on the computer, I’m typing this, you can’t even guarantee that this process will be planned on one core, so blocking the cache can be absolutely useless.

+1
source share

If your writes are slow, make sure that the other CPU core is not writing to the same memory area at the same time.

+1
source share

Perhaps you can use some kind of assembly code or, as indicated in the description, built-in functions for pre-fetching lines into the cache, but this will require a lot of time to work with it.

Just for the trial version, try reading all the data (in such a way that the compiler will not be optimized), and then write. See if that helps.

+1
source share

When you have performance problems, don’t take anything, measure first. For example, comment out the entries and see if the performance is different.

If you are writing an array of arrays, use the structure pointer to cache the address of the structure so that you do not multiply the array every time you make access. Make sure you use the native word length for the array index index variable for maximum optimization.

+1
source share

As other people have said, you cannot control this directly, but modifying the code can indirectly provide better caching. If you are working on Linux and want to get a better idea of ​​what happens with the CPU cache when you run your program, you can use the Cachegrind tool, which is part of Valgrind . This is a simulation of the processor, so it is not completely realistic, but it gives you information that is difficult to obtain in another way.

+1
source share

In the early stages of booting CoreBoot (formerly LinuxBIOS), since they do not yet have access to RAM (we are talking about BIOS code, and therefore RAM is not yet initialized), they configure what they call Cache-as -RAM (CAR), then there they use the processor cache as RAM, even if it is not supported by real RAM.

+1
source share

All Articles