Re: About a change to the implementation of spin lock in 2.6.12 kernel.

From: Willy Tarreau
Date: Thu Jul 14 2005 - 00:24:12 EST


Hi,

On Wed, Jul 13, 2005 at 07:20:06PM -0700, multisyncfe991@xxxxxxxxxxx wrote:
> Hi,
>
> I found _spin_lock used a LOCK instruction to make the following
> operation "decb %0" atomic. As you know, LOCK instruction alone takes
> almost 70 clock cycles to finish and this add lots of cost to the
> _spin_lock. However _spin_unlock does not use this LOCK instruction and
> it uses "movb $1,%0" instead since 4-byte writes on 4-byte aligned
> addresses are atomic.

_spin_unlock does not need locked operations because when it is run, the
code is already known to be the only one to hold the lock, so it can
release it without checking what others do.

> So I want rewrite the _spin_lock defined spinlock.h
> (/linux/include/asm-i386) as follows to reduce the overhead of _spin_lock
> and make it more efficient.

It does not work. You cannot write an inter-cpu atomic test-and-set with
several unlocked instructions.

> #define spin_lock_string \
> "\n1:\t" \
> "cmpb $0,%0\n\t" \
> "jle 2f\n\t" \

==> here, another thread or CPU can get the lock simultaneously.

> "movb $0, %0\n\t" \
> "jmp 3f\n" \
> "2:\t" \
> "rep;nop\n\t" \
> "cmpb $0, %0\n\t" \
> "jle 2b\n\t" \
> "jmp 1b\n" \
> "3:\n\t"
>
> Compared with the original version as follows, LOCK instruction is
> removed. I rebuilt the Intel e1000 Gigabit driver with this _spin_lock.
> There is about 2% throughput improvement.
> #define spin_lock_string \
> "\n1:\t" \
> "lock ; decb %0\n\t" \
> "jns 3f\n" \
> "2:\t" \
> "rep;nop\n\t" \
> "cmpb $0,%0\n\t" \
> "jle 2b\n\t" \
> "jmp 1b\n" \
> "3:\n\t"
>
> Do you think I can get a better performance if I dig further?
>
> Any ideas will be greatly appreciated,

well, of course with those methods you can improve performance, but you
lose the warranty that you're alone to get a lock, and that's bad.

another similar method to get a lock in some very controlled environment
is as follows :

1: cmp $0, %0
jne 1b
mov $CPUID, %0
membar
cmp $CPUID, %0
jne 1b

This only works with same speed CPUs and interrupts disabled. But in todays
environments, this is very risky (hyperthreaded CPUs, etc...). However, this
is often OK for more deterministic CPUs such as microcontrollers.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/