Re: Q: schedule() and implied barriers on arm64
From: Paul E. McKenney
Date: Fri Oct 16 2015 - 13:28:18 EST
On Fri, Oct 16, 2015 at 05:55:35PM +0100, Catalin Marinas wrote:
> I'll try to reply in Will's absence, though I gave up trying to
> understand these threads long time ago ;).
>
> On 16 October 2015 at 17:16, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Fri, Oct 16, 2015 at 09:04:22AM -0700, Paul E. McKenney wrote:
> >> On Fri, Oct 16, 2015 at 05:18:30PM +0200, Peter Zijlstra wrote:
> >> > If so, however, I suspect AARGH64 is borken and would need (just like
> >> > PPC):
> >> >
> >> > #define smp_mb__before_spinlock() smp_mb()
> >> >
> >> > The problem is that schedule() (when a NO-OP) does:
> >> >
> >> > smp_mb__before_spinlock();
> >> > LOCK rq->lock
> >> >
> >> > clear_bit()
> >> >
> >> > UNLOCK rq->lock
> >> >
> >> > And nothing there implies a full barrier on AARGH64, since
> >> > smp_mb__before_spinlock() defaults to WMB, LOCK is an "ldaxr" or
> >> > load-acquire, UNLOCK is "stlrh" or store-release and clear_bit() isn't
> >> > anything.
> >> >
> >> > Pretty much every other arch has LOCK implying a full barrier, either
> >> > because its strongly ordered or because it needs one for the ACQUIRE
> >> > semantics.
> >>
> >> But I thought that it used a dmb in the spinlock code somewhere or
> >> another...
> >
> > arm does, arm64 not so much.
>
> arm64 indeed does not have a dmb after spin_lock, it only has a
> load-acquire. So with the default smp_mb__before_spinlock() +
> spin_lock we have:
>
> smp_wmb()
> loop
> load-acquire
> store
>
> So (I think) this guarantees that any writes before wmb+lock would be
> visible before any reads _and_ writes after wmb+lock. However, the
> ordering with reads before wmb+lock is not guaranteed.
So RCU needs the following sort of guarantee:
void task1(unsigned long flags)
{
WRITE_ONCE(x, 1);
WRITE_ONCE(z, 1);
raw_spin_unlock_irqrestore(&rnp->lock, flags);
}
void task2(unsigned long *flags)
{
raw_spin_lock_irqsave(&rnp->lock, *flags);
smp_mb__after_unlock_lock();
r1 = READ_ONCE(y);
r2 = READ_ONCE(z);
}
void task3(void)
{
WRITE_ONCE(y, 1);
smp_mb();
r3 = READ_ONCE(x);
}
BUG_ON(!r1 && r2 && !r3); /* After the dust settles. */
In other words, if task2() acquires the lock after task1() releases it,
all CPUs must agree on the order of the operations in the two critical
sections, even if these other CPUs don't acquire the lock.
This same guarantee is needed if task1() and then task2() run in
succession on the same CPU with no additional synchronization of any sort.
Does this work on arm64?
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/