Re: kernel lock contention and scalability

From: Tim Wright (timw@splhi.com)
Date: Tue Mar 06 2001 - 19:28:18 EST


On Tue, Mar 06, 2001 at 11:39:17PM +0000, Matthew Kirkwood wrote:
> On Tue, 6 Mar 2001, Jonathan Lahr wrote:
>
> [ sorry to reply over another reply, but I don't have
> the original of this ]
>
> > > Tridge and I tried out the postgresql benchmark you used here and this
> > > contention is due to a bug in postgres. From a quick strace, we found
> > > the threads do a load of select(0, NULL, NULL, NULL, {0,0}).
>
> I can shed some light on this (though I'm far from a PG hacker).
>
> Postgres can use either of two locking methods -- SysV semaphores
> (which it tries to avoid, asusming that they'll be too heavy) or
> userspace spinlocks (via inline assembler on platforms which support
> it).
>
> In the slow path of a spinlock_acquire they busy wait for a few
> cycles, and then call schedule with a zero timeout assuming that
> it'll basically do the same as a sched_yield() but more portably.
>

Ugh !
I had a nasty feeling that might be what they were up to. The reason for
the "ugh" is as follows. If you're a UP system, it never makes sense to
spin in userland, since you'll just burn up a timeslice and prevent the
lock holder from running. I haven't looked, but assume that their code only
uses spinlocks on SMP. If you're an SMP system, then you shouldn't be
using a spinlock unless the critical section is "short", in which case the
waiters should simply spin in userland rather than making system calls which
is simply overhead. If the argument is that the "spinners" take too much
useful time away from other processes, then it sounds like the contention is
too high, or that the critical section is sufficiently long that semaphores
would be a better choice.

Actually, what's really needed here is an efficient form of dynamically
marking a process as non-preemptible so that when acquiring a spinlock the
process can ensure that it exits the critical section as fast as possible,
when it would relinquish its non-preemptible privilege.

Another synchronization method popular with database peeps is "post/wait"
for which SGI have a patch available for Linux. I understand that this is
relatively "light weight" and might be a better choice for PG.

Tim

-- 
Tim Wright - timw@splhi.com or timw@aracnet.com or twright@us.ibm.com
IBM Linux Technology Center, Beaverton, Oregon
Interested in Linux scalability ? Look at http://lse.sourceforge.net/
"Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 07 2001 - 21:00:21 EST