Re: [RFC PATCH] mm: silence soft lockups from unlock_page

From: Linus Torvalds
Date: Fri Aug 07 2020 - 15:08:17 EST

On Fri, Aug 7, 2020 at 11:41 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> +
> + /*
> + * If we hoped to pass PG_locked on to the next locker, but found
> + * no exclusive taker, then we must clear it before dropping q->lock.
> + * Otherwise unlock_page() can race trylock_page_bit_common(), and if
> + * PageWaiters was already set from before, then cmpxchg sees no
> + * difference to send it back to wake_up_page_bit().
> + *
> + * We may have already dropped and retaken q->lock on the way here, but
> + * I think this works out because new entries are always added at tail.
> + */
> + if (exclude && !exclusive)
> + clear_bit(bit_nr, &page->flags);
> +
> spin_unlock_irqrestore(&q->lock, flags);

Yeah, I started thinking about this, and that's when I decided to just
throw away the patch.

Because now it clears the bit *after* it has woken people up, and that
just made me go "Eww".

You did add a comment about the whole "always added to the tail"
thing, and the spinlock should serialize everything, so I guess it's
ok (because the spinlock should serialize things that care), but it
just feels wrong.

I also started worrying about other people waiting on the page lock
(ie we now have that whole io_uring case), and interaction with the
PG_writeback case etc, and it just ended up feeling very messy.

I think it was all fine - other cases won't hit that exclusive case at
all - but I had a hard time wrapping my little mind around the exact
handoff rules, and the cmpxchg behavior when other bits changed (eg
PG_waiters), so I gave up.

> My home testing was, effectively, on top of c6fe44d96fc1 (v5.8 plus
> your two patches): I did not have in the io_uring changes from the
> current tree. In glancing there, I noticed one new and one previous
> instance of SetPageWaiters() *after* __add_wait_queue_entry_tail():
> are those correct?

I don't think SetPageWaiters() has any ordering constraints with the
wait queue. Nobody looks at the waitqueue unless they already got to
the slowpath.

The only ordering constraint is with the testing of the bit we're
going to wait on. Because doing

if (test_and_set_bit())

is wrong - there's a race in there when somebody can clear the bit,
and not see that there are waiters.

And obviously that needs to be done inside the spinlock, but once you
do that, the ordering of the actual wait-queue is entirely irrelevant.
The spinlock will order _that_ part. The only thing the spinlock won't
order and serialize is the PG_lock bit and making sure we get to the
slowpath in the first place.

So if you're talking about __wait_on_page_locked_async(), then I think
that part is ok.

Or am I missing something?