Re: [RFC][PATCH 05/17] x86: Optimize arch_spin_unlock_wait()

From: Peter Zijlstra
Date: Mon Jan 03 2011 - 06:32:52 EST


On Fri, 2010-12-24 at 10:26 -0800, Linus Torvalds wrote:
> On Fri, Dec 24, 2010 at 4:23 AM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
> > Only wait for the current holder to release the lock.
> >
> > spin_unlock_wait() can only be about the current holder, since
> > completion of this function is inherently racy with new contenders.
> > Therefore, there is no reason to wait until the lock is completely
> > unlocked.
>
> Is there really any reason for this patch? I'd rather keep the simpler
> and more straightforward code unless you have actual numbers.

No numbers, the testcase I use for this series is too unstable to really
give that fine results. Its more a result of seeing the code an going:
"oohh that can wait a long time when the lock is severely contended".

But I think I can get rid of the need for calling this primitive
alltogether, which is even better.

> > +static inline void __ticket_spin_unlock_wait(arch_spinlock_t *lock)
> > +{
> > + int tmp = ACCESS_ONCE(lock->slock);
> > +
> > + if (!(((tmp >> TICKET_SHIFT) ^ tmp) & TICKET_MASK))
> > + return; /* not locked */
> > +
> > + /* wait until the current lock holder goes away */
> > + while ((lock->slock & TICKET_MASK) == (tmp & TICKET_MASK))
> > + cpu_relax();
> > }
>
> Also, the above is just ugly. You've lost the ACCESS_ONCE() on the
> lock access, and it's using another model of masking than the regular
> one. Both of which may be intentional (maybe you are _trying_ to get
> the compiler to just load the low bytes and avoid the 'and'), but the
> whole open-coding of the logic - twice, and with different looking
> masking - just makes my skin itch.

I'm not sure I fully understand the complaint here. The ACCESS_ONCE is
for the tmp variable, which we use several times and needs to contain a
single load of the lock variable and should not be optimized away into
multiple loads.

The first conditional:

if (!(((tmp >> TICKET_SHIFT) ^ tmp) & TICKET_MASK))

Is exactly like the regular __ticket_spin_is_contended, and while that
is a somewhat overly clever way of writing head != tail, I don't see a
problem with that.

The second conditional:

while ((lock->slock & TICKET_MASK) == (tmp & TICKET_MASK))

Is indeed different, it waits for the lock tail (new load) to change
from the first observed (first load) tail. Once we observe the tail
index changing we know the previous owner completed and we can drop out.

Anyway, if I can indeed get rid of my unlock_wait usage its all moot
anyway, there aren't many users of this primitive.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/