Are Linux / SMP elevators too slow?

Question

Are Linux / SMP elevators too slow?

After reading the Understanding the Linux kernel (Bovet & Cesati), in the chapter "Kernel Synchronization", it says that the spin lock capture code comes down to the following:

 1: lock: btsl $0, slp jnc 3 2: testb $1, slp jne 2 jmp 1 3:

Now, I initially thought it seemed wasteful to have nested loops, and you could implement something like:

 1: lock: btsl $0, slp jc 1

which would be much easier. However, I understand why they did this, since lock affects other CPUs, and the timings for btsl greater than for simple testb .

The only thing that I could not solve was the subsequent release of the spin lock. The book says that it gives the following:

  lock: btrl $0, slp

My question is mainly why? It seems to me that the lock/mov-immediate combination is faster.

You do not need to get the old state into the carry flag, because, following the rule that the kernel does not contain errors (it is assumed in many other places inside the specified kernel), the old state will be 1 (you would not try to free it if you have not purchased it).

And mov much faster than a btrl , at least by 386.

So what am I missing?

Are the timings for these instructions changed on subsequent chips?

Has the kernel been updated since the publication of the book?

Is the book simply wrong (or shows simplified instructions)?

I missed some other aspect related to synchronization between CPUs, which faster instruction does not satisfy?

+6

x86 linux-kernel spinlock

paxdiablo Jan 19 '11 at 8:41

source share

1 answer

Michael Foukarakis · Accepted Answer · 2011-01-19T15:41:04+0000

Well, Understanding the Linux Kernel old. Since this was written, the Linux kernel has been updated to use the so-called screw-lock tickets. The lock basically consists of a 16-bit quantity in two bytes: call one Next (for example, the next ticket in the transfer device), and another Owner (for example, the “Now Serving” number through the counter). Spin lock is initialized by both parties set to zero. The lock marks the value of the spin lock and increases further atomically. If the Next value before the increment is Owner, a lock has been obtained. Otherwise, it rotates until the owner is increased to the desired value, etc.

The corresponding code is in asm / spinlock.h (for x86). The unlock operation is really much faster and easier than the book says:

 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) { asm volatile(UNLOCK_LOCK_PREFIX "incb %0" : "+m" (lock->slock) : : "memory", "cc"); }

since inc about 8 or 9 times faster than btr .

Hope this helps; if not, I would be happy to dig deeper.

Are Linux / SMP elevators too slow?

More articles: