Peterson's algorithm does not work well in modern cached memory architecture. You ultimately need to constantly blush. Tested and blocked operations, such as blocked exchange or blocked increment, will be much more often used and have direct processor support.
source
share