Re: [PATCH v3] powerpc: spinlock: Fix spin_unlock_wait()

From: Boqun Feng
Date: Thu Jun 09 2016 - 23:03:31 EST

Next message: Michael Ellerman: "Re: of: fix autoloading due to broken modalias with no 'compatible'"
Previous message: Viresh Kumar: "Re: [PATCH] cpufreq: governor: Drop gov_cancel_work()"
In reply to: Boqun Feng: "Re: [PATCH v3] powerpc: spinlock: Fix spin_unlock_wait()"
Next in thread: Peter Zijlstra: "Re: [PATCH v3] powerpc: spinlock: Fix spin_unlock_wait()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jun 10, 2016 at 01:25:03AM +0800, Boqun Feng wrote:
> On Thu, Jun 09, 2016 at 10:23:28PM +1000, Michael Ellerman wrote:
> > On Wed, 2016-06-08 at 15:59 +0200, Peter Zijlstra wrote:
> > > On Wed, Jun 08, 2016 at 11:49:20PM +1000, Michael Ellerman wrote:
> > >
> > > > > Ok; what tree does this go in? I have this dependent series which I'd
> > > > > like to get sorted and merged somewhere.
> > > >
> > > > Ah sorry, I didn't realise. I was going to put it in my next (which doesn't
> > > > exist yet but hopefully will early next week).
> > > >
> > > > I'll make a topic branch with just that commit based on rc2 or rc3?
> > >
> > > Works for me; thanks!
> >
> > Unfortunately the patch isn't 100%.
> >
> > It's causing some of my machines to lock up hard, which isn't surprising when
> > you look at the generated code for the non-atomic spin loop:
> >
> > c00000000009af48: 7c 21 0b 78 mr r1,r1 # HMT_LOW
> > c00000000009af4c: 40 9e ff fc bne cr7,c00000000009af48 <.do_exit+0x6d8>
> >
>
> There is even no code checking for SHARED_PROCESSOR here, so I assume
> your config is !PPC_SPLPAR.
>
> > Which is a spin loop waiting for a result in cr7, but with no comparison.
> >
> > The problem seems to be that we did:
> >
> > @@ -184,7 +184,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
> > if (arch_spin_value_unlocked(lock_val))
> > goto out;
> >
> > - while (lock->slock) {
> > + while (!arch_spin_value_unlocked(*lock)) {
> > HMT_low();
> > if (SHARED_PROCESSOR)
> > __spin_yield(lock);
> >
>
> And as I also did an consolidation in this patch, we now share the same
> piece of arch_spin_unlock_wait(), so if !PPC_SPLPAR, the previous loop
> became:
>
> while (!arch_spin_value_unlocked(*lock)) {
> HMT_low();
> }
>
> and given HMT_low() is not a compiler barrier. So the compiler may
> optimize out the loop..
>
> > Which seems to be hiding the fact that lock->slock is volatile from the
> > compiler, even though arch_spin_value_unlocked() is inline. Not sure if that's
> > our bug or gcc's.
> >
>
> I think arch_spin_value_unlocked() is not volatile because
> arch_spin_value_unlocked() takes the value of the lock rather than the
> address of the lock as its parameter, which makes it a pure function.
>
> To fix this we can add READ_ONCE() for the read of lock value like the
> following:
>
> while(!arch_spin_value_unlock(READ_ONCE(*lock))) {
> HMT_low();
> ...
>
> Or you prefer to simply using lock->slock which is a volatile variable
> already?
>
> Or maybe we can refactor the code a little like this:
>
> static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
> {
> arch_spinlock_t lock_val;
>
> smp_mb();
>
> /*
> * Atomically load and store back the lock value (unchanged). This
> * ensures that our observation of the lock value is ordered with
> * respect to other lock operations.
> */
> __asm__ __volatile__(
> "1: " PPC_LWARX(%0, 0, %2, 0) "\n"
> " stwcx. %0, 0, %2\n"
> " bne- 1b\n"
> : "=&r" (lock_val), "+m" (*lock)
> : "r" (lock)
> : "cr0", "xer");
>
> while (!arch_spin_value_unlocked(lock_val)) {
> HMT_low();
> if (SHARED_PROCESSOR)
> __spin_yield(lock);
>
> lock_val = READ_ONCE(*lock);
> }
> HMT_medium();
>
> smp_mb();
> }
>

This version will generate the correct code for the loop if !PPC_SPLPAR:

c00000000009fa70: 78 0b 21 7c mr r1,r1
c00000000009fa74: ec 06 37 81 lwz r9,1772(r23)
c00000000009fa78: 00 00 a9 2f cmpdi cr7,r9,0
c00000000009fa7c: f4 ff 9e 40 bne cr7,c00000000009fa70 <do_exit+0xf0>
c00000000009fa80: 78 13 42 7c mr r2,r2

The reason I used arch_spin_value_unlocked() was trying to be consistent
with arch_spin_is_locked(), but most of our all lock primitives use
->slock directly. So I don't see a strong reason for us to use
arch_spin_value_unlocked() here. That said, this version does save a few
lines of code and make the logic a little more clear, I think.

Thoughts?

Regards,
Boqun

Attachment: signature.asc
Description: PGP signature

Next message: Michael Ellerman: "Re: of: fix autoloading due to broken modalias with no 'compatible'"
Previous message: Viresh Kumar: "Re: [PATCH] cpufreq: governor: Drop gov_cancel_work()"
In reply to: Boqun Feng: "Re: [PATCH v3] powerpc: spinlock: Fix spin_unlock_wait()"
Next in thread: Peter Zijlstra: "Re: [PATCH v3] powerpc: spinlock: Fix spin_unlock_wait()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]