If you do this in the first way, you need to make sure that the compiler really reads the variable from memory and does not optimize reading from memory, since the value cannot change inside this loop. Declaring this variable as "volatile" is necessary for this.
But this alone is not enough. To ensure that changes in a variable in one thread are visible to others, you need some kind of memory barrier, and stores and readings will not be reordered by the processor and cache. If it's on x86, you'll probably leave without it. But if you want to do such things, you are much better off using the built-in compiler functions, such as InterlockedIncrement (on windows or similar on other platforms).
For almost all cases, you'd better use a condition variable or rotation lock from the library (which is essentially what you are trying to implement), because they will return the parts for multi-core processing.
source share