Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()

From: Oleg Nesterov
Date: Fri Sep 02 2016 - 08:06:54 EST


On 09/01, Peter Zijlstra wrote:
>
> On Fri, Aug 26, 2016 at 02:45:52PM +0200, Oleg Nesterov wrote:
>
> > We do not need anything tricky to avoid the race,
>
> The race being:
>
> CPU0 CPU1 CPU2
>
> __wait_on_bit_lock()
> bit_wait_io()
> io_schedule()
>
> clear_bit_unlock()
> __wake_up_common(.nr_exclusive=1)
> list_for_each_entry()
> if (curr->func() && --nr_exclusive)
> break
>
> signal()
>
> if (signal_pending_state()) == TRUE
> return -EINTR
>
> And no progress because CPU1 exits without acquiring the lock and CPU0
> thinks its done because it woke someone.

Yes,

> > we can just call finish_wait() if action() fails.
>
> That would be bit_wait*() returning -EINTR because sigpending.

Hmm. Not sure I understand... Let me reply just in case, even if
I am sure you get it right.

Yes, in the likely case we are going to fail with -EINTR, but only
if test-and-set after thar fails.

> Sure, you can always call that, first thing through the loop does
> prepare again, so no harm. That however does not connect to your
> condition,.. /me puzzled

If ->action() fails we will abort the loop in any case, prepare
won't be called. So in this case finish_wait() does the right thing.

> > test_and_set_bit() implies mb() so
> > the lockless list_empty_careful() case is fine, we can not miss the
> > condition if we race with unlock_page().
>
> You're talking about this ordering?:
>
> finish_wait() clear_bit_unlock();
> list_empty_careful()
>
> /* MB implied */ smp_mb__after_atomic();
> test_and_set_bit() wake_up_page()
> ...
> autoremove_wake_function()
> list_del_init();
>
>
> That could do with spelling out I feel.. :-)

Yes, yes.

> > __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
> > wait_bit_action_f *action, unsigned mode)
> > {
> > + int ret = 0;
> >
> > + for (;;) {
> > prepare_to_wait_exclusive(wq, &q->wait, mode);
> > + if (test_bit(q->key.bit_nr, q->key.flags)) {
> > + ret = action(&q->key, mode);
> > + /*
> > + * Ensure that clear_bit() + wake_up() right after
> > + * test_and_set_bit() below can't see us; it should
> > + * wake up another exclusive waiter if we fail.
> > + */
> > + if (ret)
> > + finish_wait(wq, &q->wait);
> > + }
> > + if (!test_and_set_bit(q->key.bit_nr, q->key.flags)) {
>
> So this is the actual difference, instead of failing the lock and
> aborting on signal, we acquire the lock if possible. If its not
> possible, someone else has it, which guarantees that someone else will
> do an unlock which implies another wakeup and life goes on.

Yes. This way we eliminate the need for the additional wake_up.

Oleg.