Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()

From: Peter Zijlstra
Date: Thu Sep 01 2016 - 17:14:42 EST


On Fri, Aug 26, 2016 at 02:45:52PM +0200, Oleg Nesterov wrote:

> We do not need anything tricky to avoid the race,

The race being:

CPU0 CPU1 CPU2

__wait_on_bit_lock()
bit_wait_io()
io_schedule()

clear_bit_unlock()
__wake_up_common(.nr_exclusive=1)
list_for_each_entry()
if (curr->func() && --nr_exclusive)
break

signal()

if (signal_pending_state()) == TRUE
return -EINTR

And no progress because CPU1 exits without acquiring the lock and CPU0
thinks its done because it woke someone.

> we can just call finish_wait() if action() fails.

That would be bit_wait*() returning -EINTR because sigpending.

Sure, you can always call that, first thing through the loop does
prepare again, so no harm. That however does not connect to your
condition,.. /me puzzled

> test_and_set_bit() implies mb() so
> the lockless list_empty_careful() case is fine, we can not miss the
> condition if we race with unlock_page().

You're talking about this ordering?:

finish_wait() clear_bit_unlock();
list_empty_careful()

/* MB implied */ smp_mb__after_atomic();
test_and_set_bit() wake_up_page()
...
autoremove_wake_function()
list_del_init();


That could do with spelling out I feel.. :-)

> __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
> wait_bit_action_f *action, unsigned mode)
> {
> + int ret = 0;
>
> + for (;;) {
> prepare_to_wait_exclusive(wq, &q->wait, mode);
> + if (test_bit(q->key.bit_nr, q->key.flags)) {
> + ret = action(&q->key, mode);
> + /*
> + * Ensure that clear_bit() + wake_up() right after
> + * test_and_set_bit() below can't see us; it should
> + * wake up another exclusive waiter if we fail.
> + */
> + if (ret)
> + finish_wait(wq, &q->wait);
> + }
> + if (!test_and_set_bit(q->key.bit_nr, q->key.flags)) {

So this is the actual difference, instead of failing the lock and
aborting on signal, we acquire the lock if possible. If its not
possible, someone else has it, which guarantees that someone else will
do an unlock which implies another wakeup and life goes on.

> + if (!ret)
> + finish_wait(wq, &q->wait);
> + return 0;
> + } else if (ret) {
> + return ret;
> + }
> + }
> }

> I am not sure we even want to conditionalize both finish_wait()'s,
> we could simply call it unconditionally and once before test_and_set(),
> the spurious wakeup is unlikely case.


ret = 0;

for (;;) {
prepare_to_wait_exclusive(wq, &q->wait, mode);

if (test_bit(&q->key.bit_nr, &q->key.flag))
ret = action(&q->key, mode);

if (!test_and_set_bit(&q->key.bit_nr, &q->key.flag)) {
/* we got the lock anyway, ignore the signal */
ret = 0;
break;
}

if (ret)
break;
}
finish_wait(wq, &q->wait);

return ret;


Would not that work too?