Re: Memory hotplug softlock issue

From: Hugh Dickins
Date: Tue Nov 20 2018 - 22:20:53 EST


On Tue, 20 Nov 2018, Hugh Dickins wrote:
> On Tue, 20 Nov 2018, Vlastimil Babka wrote:
> > >
> > > finish_wait(q, wait);
> >
> > ... the code continues by:
> >
> > if (thrashing) {
> > if (!PageSwapBacked(page))
> >
> > So maybe we should not set 'thrashing' true when lock < 0?
>
> Very good catch, thank you Vlastimil: as you might have guessed, the
> patch from a pre-PSI kernel applied cleanly, and I just hadn't reviewed
> the surrounding context properly before sending out.
>
> I cannot say immediately what the right answer is, I'll have to do some
> research first: maybe not enter the block that sets thrashing true when
> lock < 0, as you suggest, or maybe force lock < 0 to 0 and put_page()
> afterwards, or...

... I still won't adjust the patch tonight, but the answer is obvious
now I look closer: as you show in your extract above, the only thing
it does with "page" at the end is to ask if it was SwapBacked, so we
just need to set one more bool at the beginning to check at the end
(or I could make "thrashing" a -1, 0, 1 int like "lock": but my guess
is that that would not be to other people's taste: acceptable for the
arg, but stretching your patience for the local variable).

By the way, I do have a further patch to wait_on_page_bit_common(),
which I could send at the same time, if it sounds right to you
(but it's a no-op in the put_and_wait page migration case). That
__add_wait_queue_entry_tail() is right for the first time into
the loop, but I maintain that it should use __add_wait_queue()
for fairness thereafter, to avoid repeatedly sending older waiters
back to the back of the queue. I don't have hard numbers for it,
but it's one of several patches, each of which helped to reduce our
wait_on_page_bit lockups in some (perhaps unrealistic) stress tests.

Hugh