Digging around for links on heap on Windows (as well as some infosec articles ), there are some tidbits about general heap deceleration , which says a few are
- Slowdown as a result of allocation operations.
- Slowdown as a result of free operations.
- Slowdown as a result of frequent discharge and repeated sessions.
Which helps explain why slow is happening (i.e., frequent discharge and reallocs), but it doesn't really explain why slow is happening.
First of all, it should be noted that sizeof(long) != sizeof(long) , that is, in the tests that I did with 64-bit assemblies using g++ and Visual Studio 12, sizeof(long) for Windows was 4, and in Linux it was 8. This is an important note when allocating / freeing memory. If you change the code from the long type to the type where sizeof(T) == 8 (for example, long long ), the problem will disappear and the times will be consistent between iterations. Example:
void iteration() { int n = 25000; // Allocate memory long long** blocks = new long long*[n]; for (int i = 0; i < n; ++i) { blocks[i] = new long long[100008]; } // Free all allocated memory for (int i = 0; i < n; ++i) { delete[] blocks[i]; } delete[] blocks; } // sizeof(long long) == 8 on my Linux/Unix and Windows 64-bit machines
It should also be noted that the synchronization problem is fixed only for this particular code.
If you save the long long type, but set 100008 to say 16666 , the problem reappears; further, if you changed it to 16668 and started the iteration of long long , which was immediately followed by the long version, the time will increase for the long long function, then down for long , for example:
template < typename T > void iteration() { int n = 25000; // Allocate memory T** blocks = new T*[n]; for (int i = 0; i < n; ++i) { blocks[i] = new T[16668]; } // Free all allocated memory for (int i = 0; i < n; ++i) { delete[] blocks[i]; } delete[] blocks; } for (int i = 0; i < nbItrs; ++i) { iteration<long long>(); // time goes UP } for (int i = 0; i < nbItrs; ++i) { iteration<long>(); // time goes DOWN }
In addition, the published code produces similar results using malloc / free , LocalAlloc / LocalFree and / or HeapAlloc / HeapFree , since new / malloc (on Windows) calls HeapAlloc . The reason is due to the way Windows manages heap memory and free memory locations. When pages need to be deleted, you need to clear the list of free blocks and you may need to adjust the list.
This setting, which may take some time while searching for and replacing or removing old memory blocks from the list. If the blocks do not fall on clean boundaries, you may need to make additional adjustments to the list of free blocks in the heap.
Going deeper into how and why Windows heap management is related to explaining Windows kernel design and memory management. Entering into this is beyond the scope of this question / answer, but some of the articles I contacted above have excellent reviews and explain how and why.
However you asked
How to make code work stable on Windows?
As indicated above, changing the type will allow for a more consistent time, in addition, as indicated in another, deleting the list in the reverse order will also lead to more consistent timings;
for (int i = n; i > 0; --i ) { delete[] blocks[i-1]; }
This is because the Windows memory kernel uses one or more linked lists to preserve heap locations, so the time may increase when delete ing as the list moves and why the time may be slower on Windows compared to Linux (although my tests actually produced similar times when making these changes).
Hope this helps.