Re: kernel BUG at mm/page-writeback.c:LINE!

From: Linus Torvalds
Date: Tue Jan 05 2021 - 14:32:40 EST


On Mon, Jan 4, 2021 at 7:29 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote:
>
> > But I feel it's really that end_page_writeback() itself is
> > fundamentally buggy, because the "wakeup" is not atomic with the bit
> > clearing _and_ it doesn't actually hold the page lock that is
> > allegedly serializing this all.
>
> And we won't be adding a lock_page() into end_page_writeback()!

Right.

However, the more I think about this, the more I feel
end_page_writeback() is kind of ok, and the real problem is that we
should have had a "lock/unlock" pattern for the PG_writeback bit
instead.

Instead, what we have is basically a special "wait for bit to clear",
and then external sychronization - even if lacking - for set/clear
bit.

And the "wait for bit to clear" and "set bit" aren't even adjacent -
the waiting happens in write_cache_pages(), but then the actual
setting happens later in the actual "(*writepage)()" function.

What _would_ be nicer, I feel, is if write_cache_pages() simply did
the equivalent of "lock_page()" except for setting the PG_writeback
bit. The whole "wait for bit to clear" is fundamentally a much more
ambiguous thing for that whole "what if somebody else got it" reason,
but it's also nasty because it can be very unfair (the same way our
"lock_page()" itself used to be very unfair).

IOW, you can have that situation where one actor continually waits for
the bit to clear, but then somebody else comes in and gets the bit
instead.

Of course, writeback is much less likely than lock_page(), so it
probably doesn't happen much in practice.

And honestly, while I think it would be good to make the rule be
"writepage function was entered with the writeback bit already held",
that kind of change would be a major pain. We have at leastr 49
different 'writepage' implementations, and while I think a few of them
end up just being block_write_full_page(), it means that we have a lot
of that

BUG_ON(PageWriteback(page));
set_page_writeback(page);

pattern in various random filesystem code that all would have to be changed.

So I think I know what I'd _like_ the code to be like, but I don't
think it's worth it.

> > So the one-liner of changing the "if" to "while" in
> > wait_on_page_writeback() should get us back to what we used to do.
>
> I think that is the realistic way to go.

Yeah, that's what I'll do.

> > Which makes me think that the best option would actually be to move
> > the bit clearing _into_ wake_up_page(). But that looks like a very big
> > change.

See above - having thought about this overnight, I don't think this is
even the best change.

> But I expect you to take one look and quickly opt instead for the "while"
> in wait_on_page_writeback(): which is not so bad now that you've shown a
> scenario which justifies it.

[ patch removed - I agree, this pain isn't worth it ]

Linus