.NET or Windows primitive performance specifications

I am currently writing a scientific article where I need to be very precise with a quote. Can someone point me to MSDN, an MSDN article, to some published source of the article or to a book where I can find a comparison of the performance of Windows or .NET primitives.

I know that they are in a descending order of performance: Interlocked API, Critical Section, .NET lock-statement, Monitor, Mutex, EventWaitHandle, Semaphore.

Many thanks,
Hovhannes

PS I found a great book: Parallel Programming on Windows Joe Duffy . This book is written by one of the concurrency developers for the .NET Framework and is simply awesome with lots of explanation of how everything works or was implemented.

+2
multithreading windows concurrency winapi
source share
4 answers

For a rough comparison, figures follow from Lockless programming options for the Xbox 360 and Microsoft Windows may come in handy.


The performance of Windows synchronization commands and functions varies greatly depending on the type and configuration of the processor, as well as what other code is running. Multi-core and multi-processor systems often require more time to execute synchronization instructions, and acquiring locks takes much longer if another thread currently owns the lock.

However, even some measurements obtained from very simple tests are useful:

  • MemoryBarrier was measured as receiving 20-90 cycles .
  • InterlockedIncrement was measured as taking 36-90 cycles .
  • The acquisition or release of a critical section was measured as taking 40-100 cycles .
  • The acquisition or release of a mutex was measured as taking 750-2500 cycles .

These tests were performed on Windows XP on various processors. Short times were on a single-processor machine, and longer times were on a multiprocessor machine.

+3
source share

I doubt that you will find direct numbers on them - they vary depending on the OS and CPU, as well as in different situations.

It is not good to compare the performance of these primitives, since they do different things - EventWaitHandle has a different behavior than the critical section, so you cannot directly compare your performance. In addition, you will find that they are performed differently in different situations - the critical section is faster than the mutex for non-contact acquisition, but will be similar in performance in the face of competition. Some of these primitives can perform terribly in the face of fierce rivalry, while others will improve significantly.

I recommend creating a test program for measuring performance - you don’t need to write and measure the performance of each of these primitives for too long, and you can answer any questions about numbers in your article.

+2
source share

Behavior:

  • Not a simple top-down list, as some do more work than others.
  • Depends on the cost depending on the architecture of the processor on which you work, the number of cores in the system and the version of the windows.

Some notes:

  • the lock statement is the syntax sugar for the Monitor class.
  • Many of these are incredibly thin wrappers around the main win32 api calls, often directly from P / Invoke. Some of them themselves are thin instructions wrappers with multiple processors.

The lower the command level, the greater the difference with low-level hardware. For example, caching and invalidation procedures in a processor in a single package / NUMA node can be much faster than in older FSB SMP systems.

+2
source share

Finding specific numbers is difficult, and I highly recommend that you check for locks in your scenario, because the performance will depend on access factors, rivalry patterns, and the hardware on which it runs. I also recommend that you enable spinning locks in .NET 4.0 compared to System.Threading.SpinLock and System.Threading.SemaphoreSlim.

Having said that Joe Duffy has several blog posts that compare perf from individual locks, like this one .

+1
source share

All Articles