After reading the Understanding the Linux kernel (Bovet & Cesati), in the chapter "Kernel Synchronization", it says that the spin lock capture code comes down to the following:
1: lock: btsl $0, slp jnc 3 2: testb $1, slp jne 2 jmp 1 3:
Now, I initially thought it seemed wasteful to have nested loops, and you could implement something like:
1: lock: btsl $0, slp jc 1
which would be much easier. However, I understand why they did this, since lock affects other CPUs, and the timings for btsl greater than for simple testb .
The only thing that I could not solve was the subsequent release of the spin lock. The book says that it gives the following:
lock: btrl $0, slp
My question is mainly why? It seems to me that the lock/mov-immediate combination is faster.
You do not need to get the old state into the carry flag, because, following the rule that the kernel does not contain errors (it is assumed in many other places inside the specified kernel), the old state will be 1 (you would not try to free it if you have not purchased it).
And mov much faster than a btrl , at least by 386.
So what am I missing?
Are the timings for these instructions changed on subsequent chips?
Has the kernel been updated since the publication of the book?
Is the book simply wrong (or shows simplified instructions)?
I missed some other aspect related to synchronization between CPUs, which faster instruction does not satisfy?
x86 linux-kernel spinlock
paxdiablo
source share