Re: [PATCH] locking/rwsem: Synchronize task state & waiter->task of readers

From: Andrea Parri
Date: Mon Apr 23 2018 - 16:55:31 EST


Hi Waiman,

On Mon, Apr 23, 2018 at 12:46:12PM -0400, Waiman Long wrote:
> On 04/10/2018 01:22 PM, Waiman Long wrote:
> > It was observed occasionally in PowerPC systems that there was reader
> > who had not been woken up but that its waiter->task had been cleared.

Can you provide more details about these observations? (links to LKML
posts, traces, applications used/micro-benchmarks, ...)


> >
> > One probable cause of this missed wakeup may be the fact that the
> > waiter->task and the task state have not been properly synchronized as
> > the lock release-acquire pair of different locks in the wakeup code path
> > does not provide a full memory barrier guarantee.

I guess that by the "pair of different locks" you mean (sem->wait_lock,
p->pi_lock), right? BTW, __rwsem_down_write_failed_common() is calling
wake_up_q() _before_ releasing the wait_lock: did you intend to exclude
this callsite? (why?)


> So smp_store_mb()
> > is now used to set waiter->task to NULL to provide a proper memory
> > barrier for synchronization.

Mmh; the patch is not introducing an smp_store_mb()... My guess is that
you are thinking at the sequence:

smp_store_release(&waiter->task, NULL);
[...]
smp_mb(); /* added with your patch */

or what am I missing?


> >
> > Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> > ---
> > kernel/locking/rwsem-xadd.c | 17 +++++++++++++++++
> > 1 file changed, 17 insertions(+)
> >
> > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > index e795908..b3c588c 100644
> > --- a/kernel/locking/rwsem-xadd.c
> > +++ b/kernel/locking/rwsem-xadd.c
> > @@ -209,6 +209,23 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
> > smp_store_release(&waiter->task, NULL);
> > }
> >
> > + /*
> > + * To avoid missed wakeup of reader, we need to make sure
> > + * that task state and waiter->task are properly synchronized.
> > + *
> > + * wakeup sleep
> > + * ------ -----
> > + * __rwsem_mark_wake: rwsem_down_read_failed*:
> > + * [S] waiter->task [S] set_current_state(state)
> > + * MB MB
> > + * try_to_wake_up:
> > + * [L] state [L] waiter->task
> > + *
> > + * For the wakeup path, the original lock release-acquire pair
> > + * does not provide enough guarantee of proper synchronization.
> > + */
> > + smp_mb();
> > +
> > adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
> > if (list_empty(&sem->wait_list)) {
> > /* hit end of list above */
>
> Ping!
>
> Any thought on this patch?
>
> I am wondering if there is a cheaper way to apply the memory barrier
> just on architectures that need it.

try_to_wake_up() does:

raw_spin_lock_irqsave(&p->pi_lock, flags);
smp_mb__after_spinlock();
if (!(p->state & state))

My understanding is that this smp_mb__after_spinlock() provides us with
the guarantee you described above. The smp_mb__after_spinlock() should
represent a 'cheaper way' to provide such a guarantee.

If this understanding is correct, the remaining question would be about
whether you want to rely on (and document) the smp_mb__after_spinlock()
in the callsite in question (the comment in wake_up_q()

/*
* wake_up_process() implies a wmb() to pair with the queueing
* in wake_q_add() so as not to miss wakeups.
*/

does not appear to be suffient...).

Andrea


>
> Cheers,
> Longman
>