Oh my God!
It seems that the correct answer is marked here as ANSWER, which is essentially incorrect! I would like to ask the author of the answer, with respect, to read the related article to the end. article
The author of the article from the article was evaluated only on a dual-core processor, and in the first measurement case he measured the lock with only one thread and the result was about 50 ns for access to the lock.
It says nothing about locking in a parallel environment. Thus, we should continue reading the article, and in the second half, the author measured a blocking scenario with two and three threads, which approaches the levels of modern concurrency processors.
So, the author says that with two threads on Dual Core locks cost 120 ns, and with 3 threads it goes to 180 ns. Thus, it clearly depends on the number of simultaneously available threads, and even worse.
So, itโs simple, itโs not 50 ns, unless it is a single thread where the lock becomes useless.
Another issue to consider is that it is measured as average time !
If iteration time were measured, there would even be a time between 1 ms to 20 ms, simply because most were fast, but several threads were waiting for processor time and even had millisecond delays.
This is bad news for any application that requires high bandwidth, low latency.
And the last question to consider is that inside the lock there may be slower operations, and this is often the case. The longer the code block runs inside the castle, the higher the conflict and the increase in the height of the sky is delayed.
Please think that more than one decade has already passed since 2003, that is, several generations of processors designed specifically for simultaneous start and blocking significantly degrade their performance.