Re: [RFC PATCH] mm: silence soft lockups from unlock_page

From: Oleg Nesterov
Date: Thu Jul 23 2020 - 08:48:00 EST


On 07/22, Linus Torvalds wrote:
>
> Comments? Oleg, this should fix the race you talked about too.

Yes.

I still can't convince myself thatI fully understand this patch but I see
nothing really wrong after a quick glance...

> + * We can no longer use 'wait' after we've done the
> + * list_del_init(&wait->entry),

Yes, but see below,

> + * the target may decide it's all done with no
> + * other locking, and 'wait' has been allocated on
> + * the stack of the target.
> */
> - if (test_bit(key->bit_nr, &key->page->flags))
> - return -1;
> + target = wait->private;
> + smp_mb();
>
> - return autoremove_wake_function(wait, mode, sync, key);
> + /*
> + * Ok, we have successfully done what we're waiting for.
> + *
> + * Now unconditionally remove the wait entry, so that the
> + * waiter can use that to see success or not.
> + *
> + * We _really_ should have a "list_del_init_careful()"
> + * to properly pair with an unlocked "list_empty_careful()".
> + */
> + list_del_init(&wait->entry);
> +
> + /*
> + * Theres's another memory barrier in the wakup path, that
> + * makes sure the wakup happens after the above is visible
> + * to the target.
> + */
> + wake_up_state(target, mode);

We can no longer use 'target'. If it was already woken up it can notice
list_empty_careful(), return without taking q->lock, and exit.

Of course, this is purely theoretical... rcu_read_lock() should help
but perhaps we can avoid it somehow?

Say, can't we abuse WQ_FLAG_WOKEN?

wake_page_function:
wait->flags |= WQ_FLAG_WOKEN;
wmb();
autoremove_wake_function(...);

wait_on_page_bit_common:

for (;;) {
set_current_state();
if (wait.flags & WQ_FLAG_WOKEN)
break;
schedule();
}

finish_wait();

rmb();
return wait.flags & WQ_FLAG_WOKEN ? 0 : -EINTR;

Another (cosmetic) problem is that wake_up_state(mode) looks confusing.
It is correct but only because we know that mode == TASK_NORMAL and thus
wake_up_state() can'fail if the target is still blocked.

> + spin_lock_irq(&q->lock);
> + SetPageWaiters(page);
> + if (!trylock_page_bit_common(page, bit_nr, behavior))
> + __add_wait_queue_entry_tail(q, wait);

do we need SetPageWaiters() if trylock() succeeds ?

Oleg.