Re: possible spinlock optimizations

Ingo Molnar (mingo@chiara.csoma.elte.hu)
Tue, 28 Sep 1999 17:06:57 +0200 (CEST)


On Tue, 28 Sep 1999, Manfred wrote:

> I think there are 2 possible optimizations for spinlocks:
> 1) spin_lock_irq(): What about enabling interrupts while spinning?

you are right technically, but the point is that if spinlock contention is
high then there is something wrong about that lock. Spinlocks are supposed
to be held for short periods of time, because they imply a huge waste of
CPU power if in contention: the CPU that is spinning is doing nothing
useful in most cases.

> I'm only talking about spin_lock_irq(), obviously this is impossible for
> spin_lock_irqsave().

(even in this case it's not impossible)

Note that historically we did exactly what you are proposing in 2.0 (or
2.1 kernels?), lock_kernel() did a sti in it's slow path. (because at that
time we held the kernel lock for many millisecs so it became an
interactivity issue) This hack (of mine) was removed in 2.2.

> 2) Have you read this note about W2K spinlock?
> http://www.numega.com/drivercentral/resources/spinlocks.shtml
> They claim their implementation has better SMP memory bus characteristics.
> Has anyone tried to implement something like this?

they are fixing the symptoms. Windows NT apparently has problems with high
contention spinlocks. So instead of reducing the number of spinlocks and
reducing lock collisions, they decided to optimize the 'contention case'.
This is a step in the wrong direction, as it makes the 'no contention'
case slower. No amount of trickery is going to avoid the spinning CPU to
waste precious cycles on pure spinning! The 'no contention case' in Linux
is highly optimized, it's only 2 inlined assembly instructions to aquire,
and 1 inlined assembly instruction to release. Windows NT is digging
itself into a deeper and deeper architectural hole me thinks ...

Linux is doing another trick here to reduce bus traffic if lock contention
happens: once we go into the slow path we do not do interlocked atomic
instructions to poll the state of the spinlock flag, but normal memory
access instructions - this also ensures that we generate cross-cache bus
traffic only when the spinlock is released. This way we basically get the
kind of 'good' (nonintrusive) bus traffic what queued spinlocks are
supposed to do primarily - without the overhead of queued spinlocks.

(btw. NT doesnt do the kind of off-line spinlock slow-path thing Linux
does, no wonder they see high spinlock overhead. NT calls functions to
aquire/release spinlocks, yuck!)

NT designers i believe also have made a mistake with the 'processor-local
area' thing which they currently implement through a special segment and
%fs. Not only is it slower on x86 (%fs access is a bigger opcode and
doesnt optimize as well within the CPU), but they'll also have to waste a
whole register on that in IA64 ... Apparently that David Cutler guy has
already left the building? ;)

in the semaphore case the picture is completely different - semaphores
(mutexes) naturally want to have the kind of queueing characteristics
described above. Linux 2.3 does this and more (we do wake-one on
semaphores). Linux semaphores are still very lightweight in the no
contention case: 2 inlined assembly instructions to aquire and 2 inlined
assembly instructions to release.

anyway, Linux tries to have the highest quality SMP core architecture
possible physically - if you can poke holes into it, feel free!

-- mingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/