Why do my memory tests give strange results?

I recently used some basic tests written in C # to try to determine the reason why some seemingly identical HyperV remote workstations are much slower than others. Their results for most of the main tests that I run were completely identical, but the results of the basic test of accessing the base memory (in particular, the time taken to initialize a two-dimensional array of doubling 1000x1000 to 0), differ by 40 times.

To continue exploring this problem, I conducted several other experiments to further aggravate the problem. Running the same test with an exponentially increasing array size (until an OutOfMemoryException occurs) does not show the difference between the different remotes until the array size exceeds 1 m and then an immediate difference of 40. In fact, the time taken to initialize, increases in proportion to the size of the array to the size of the array exactly 999999, and then on โ€œslowโ€ consoles, time increases by 900%, while on โ€œfastโ€ consoles it decreases by 70% as the size of the array reaches 1000x1000. From there, it continues to scale proportionally. The same thing happens with the dimensions of the array 1 m x 1 and 1 x 1 m, but to a much lesser extent (instead of changes by + 50% and -30%.

Interestingly, changing the type of data used for the swimming experiment seems to completely eliminate this phenomenon. There is no difference between the consoles in any test, and the time, which is apparently completely proportional even to breakpoints 1000 * 1000 and 2000 * 2000. Another interesting factor is that the local workstation that I use seems to be , reflects the mirror of slow remotes.

Does anyone know what settings in the system configuration can cause this effect and how it can be changed or what can be done to further debug the problem?

+7
arrays c # benchmarking
source share
1 answer

You need to keep in mind that you are really testing. Most likely, this is not the ability of a .NET program to assign array elements. This is very fast and usually continues, these are the memory bands for a large array, usually ~ 37 gigabytes per second, depending on the type of RAM that the machine has, 5 GB / sec per squeak that you might encounter today (slow DDR2 synchronization on the old car).

The new keyword only allocates address space in the virtual memory operating system with on-demand queries, such as Windows. Just processor numbers, one for every 4096 bytes.

As soon as you start to assign elements for the first time, the on-demand load function starts, and your code forces the operating system to allocate RAM for the array. Assigning an array element causes a page error, one for every 4096 bytes in the array. Or 512 doubles for your array. The cost of processing a page error is included in your measurements.

This is a smooth swim only when the OS has a zero initialized RAM page, ready for use. Usually takes half a microsecond fat, give or take. There is still a lot of time for the processor, it will be stopped when the OS updates the page display. Keep in mind that this only happens when you access the element for the first time, and subsequent ones are faster because the RAM page remains available. Usually.

It is not a swim when such a RAM page is not available. Then the OS should plunder one. In your case, there are only 4 different scenarios that I can think of:

  • the page is accessible but not yet initialized to a zero flow of a downstream page with a low priority. Being fast does not require much effort.
  • the page must be stolen from another process, and the contents of this page need not be saved. This happens for pages that previously contained code, for example. Also very fast.
  • the page must be stolen, and its contents must be saved in the page file. This happens for pages that previously contained data, for example. A hard page error that hurts. The processor will stop when burning a disc.
  • specific to your scenario, the HyperV Manager decides that it needs to take up more RAM from the host operating system. All previous bullets are applicable to this OS, as well as the overhead of interacting with the OS. There is no real idea how much overhead entails, it should be painful.

Which of these bullets you are about to hit is very, very unpredictable. Most of all, because it is not only about your program, everything that works on it affects it. And there the memory effect, something like writing a large file just before the start of the test, will have a sharp side effect caused by the use of RAM pages used by the file system cache that the disk is waiting for. Or another process that has a distribution package and flushes a zero page queue. Saturated core memory is fairly easy to use and can also be affected by the host operating system. Etcetera.

Long and short is that profiling this code just doesn't matter much. Everything can and will happen, and you donโ€™t have a decent way to predict this. Or a good way to do something with this, other than providing VM RAM buffers and not run anything else on it. Profiling results for the second pass through the array will be much more stable and significant, the OS is no longer involved.

+9
source share

All Articles