Why is Windows memory reallocation slowing down?

I want to understand why the following code behaves differently on my linux and windows 7 machines: In linux, ~ 120 ms is required for each iteration. In Windows 7, the first iteration takes 0.4 seconds, and subsequent iterations take much longer. Iteration 8 already takes about 11 seconds, iteration 22 takes about 1 minute.

I have observed this behavior on different hardware. It looks like windows.

#include <iostream> #include <time.h> #include <chrono> void iteration() { int n = 25000; // Allocate memory long** blocks = new long*[n]; for( int i = 0; i<n; ++i ) { blocks[i] = new long[100008]; } // Free all allocated memory for( int i = 0; i<n; ++i ) { delete[] blocks[i]; } delete[] blocks; } int main(int argc, char **argv) { int nbIter = 30; for( int i = 0; i < nbIter; ++i ) { auto start = std::chrono::system_clock::now(); iteration(); auto end = std::chrono::system_clock::now(); auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start); std::cout << "Iteration #" << i << ": time=" << elapsed.count() << "ms" << std::endl; } return 0; } 

Can someone tell me what is happening here and how to make the code work stably on windows?

edit: I built releases in VS2013 on windows, I ran the program from outside VS. Here are a few more accurate runtimes (in seconds):

 Iteration #0: time=0.381000 Iteration #1: time=0.391000 Iteration #2: time=0.451000 Iteration #3: time=1.507000 Iteration #4: time=1.711000 Iteration #5: time=2.938000 Iteration #6: time=4.770000 Iteration #7: time=7.840000 Iteration #8: time=10.563000 Iteration #9: time=14.606000 Iteration #10: time=20.732000 Iteration #11: time=24.948000 Iteration #12: time=30.255000 Iteration #13: time=34.608000 Iteration #14: time=38.114000 Iteration #15: time=43.217000 Iteration #16: time=39.926000 Iteration #17: time=43.506000 Iteration #18: time=43.660000 Iteration #19: time=45.291000 Iteration #20: time=50.003000 
+7
c ++ windows
source share
3 answers

Digging around for links on heap on Windows (as well as some infosec articles ), there are some tidbits about general heap deceleration , which says a few are

  • Slowdown as a result of allocation operations.
  • Slowdown as a result of free operations.
  • Slowdown as a result of frequent discharge and repeated sessions.

Which helps explain why slow is happening (i.e., frequent discharge and reallocs), but it doesn't really explain why slow is happening.

First of all, it should be noted that sizeof(long) != sizeof(long) , that is, in the tests that I did with 64-bit assemblies using g++ and Visual Studio 12, sizeof(long) for Windows was 4, and in Linux it was 8. This is an important note when allocating / freeing memory. If you change the code from the long type to the type where sizeof(T) == 8 (for example, long long ), the problem will disappear and the times will be consistent between iterations. Example:

 void iteration() { int n = 25000; // Allocate memory long long** blocks = new long long*[n]; for (int i = 0; i < n; ++i) { blocks[i] = new long long[100008]; } // Free all allocated memory for (int i = 0; i < n; ++i) { delete[] blocks[i]; } delete[] blocks; } // sizeof(long long) == 8 on my Linux/Unix and Windows 64-bit machines 

It should also be noted that the synchronization problem is fixed only for this particular code.

If you save the long long type, but set 100008 to say 16666 , the problem reappears; further, if you changed it to 16668 and started the iteration of long long , which was immediately followed by the long version, the time will increase for the long long function, then down for long , for example:

 template < typename T > void iteration() { int n = 25000; // Allocate memory T** blocks = new T*[n]; for (int i = 0; i < n; ++i) { blocks[i] = new T[16668]; } // Free all allocated memory for (int i = 0; i < n; ++i) { delete[] blocks[i]; } delete[] blocks; } for (int i = 0; i < nbItrs; ++i) { iteration<long long>(); // time goes UP } for (int i = 0; i < nbItrs; ++i) { iteration<long>(); // time goes DOWN } 

In addition, the published code produces similar results using malloc / free , LocalAlloc / LocalFree and / or HeapAlloc / HeapFree , since new / malloc (on Windows) calls HeapAlloc . The reason is due to the way Windows manages heap memory and free memory locations. When pages need to be deleted, you need to clear the list of free blocks and you may need to adjust the list.

This setting, which may take some time while searching for and replacing or removing old memory blocks from the list. If the blocks do not fall on clean boundaries, you may need to make additional adjustments to the list of free blocks in the heap.

Going deeper into how and why Windows heap management is related to explaining Windows kernel design and memory management. Entering into this is beyond the scope of this question / answer, but some of the articles I contacted above have excellent reviews and explain how and why.

However you asked

How to make code work stable on Windows?

As indicated above, changing the type will allow for a more consistent time, in addition, as indicated in another, deleting the list in the reverse order will also lead to more consistent timings;

 for (int i = n; i > 0; --i ) { delete[] blocks[i-1]; } 

This is because the Windows memory kernel uses one or more linked lists to preserve heap locations, so the time may increase when delete ing as the list moves and why the time may be slower on Windows compared to Linux (although my tests actually produced similar times when making these changes).

Hope this helps.

+2
source share

An interesting problem. I was able to reproduce.

I get consistency - albeit somewhat sluggish - performance delete[] - blocking blocks in the reverse order of their distribution:

 for( int i = 0; i<n; ++i ) delete[] blocks[n - 1 - i]; 

I suspect all of this may relate to coalescing overhead - from MSDN here :

Slowdown as a result of free operations. Free operations consume more cycles, mainly if coalescing is enabled. During coalescence, each free operation must β€œfind” its neighbors, pull them out to build a larger block, and insert the larger block into the free list. During this find, memory can be affected at random, which leads to cache failure and slow performance.

There are some weird things about this:

  • my measurements showed that although delete[] took ~ 80% of the time at the first iteration or three, after half a dozen more new[] took almost the same amount of time.

  • The problem arose very suddenly when I switched from new long[91134] to ... 91135 : it's almost 356kb, but I was not able to connect anything with Google.

+2
source share

A very interesting problem. I cannot play it in Windows 10 with MS Visual Studio Community 2013, however, if you allocate / deallocate many fixed-size memory blocks, you can try to replace the new / delete with the fixed-size block allocation algorithm, also called the memory pool. It works much faster and at a constant speed. Here you can find an example based on the BSD license: https://github.com/intelmm/FixMemAlloc/blob/master/Sources/MemoryPool.h . Maybe this can help.

0
source share

All Articles