Re: [PATCH] mm: page_alloc: update the COMPACT[STALL|FAIL] events properly

From: Vlastimil Babka
Date: Tue Feb 09 2021 - 11:08:58 EST


On 2/5/21 11:28 PM, David Rientjes wrote:
> On Tue, 2 Feb 2021, Charan Teja Kalla wrote:
>
>> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> >> index 519a60d..531f244 100644
>> >> --- a/mm/page_alloc.c
>> >> +++ b/mm/page_alloc.c
>> >> @@ -4152,6 +4152,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
>> >> memalloc_noreclaim_restore(noreclaim_flag);
>> >> psi_memstall_leave(&pflags);
>> >>
>> >> + if (*compact_result == COMPACT_SKIPPED)
>> >> + return NULL;
>> >> /*
>> >> * At least in one zone compaction wasn't deferred or skipped, so let's
>> >> * count a compaction stall
>> >
>> > This makes sense, I wonder if it would also be useful to check that
>> > page == NULL, either in try_to_compact_pages() or here for
>> > COMPACT_SKIPPED?
>>
>> In the code, when COMPACT_SKIPPED is being returned, the page will
>> always be NULL. So, I'm not sure how much useful it is for the page ==
>> NULL check here. Or I failed to understand your point here?
>>
>
> Your code is short-circuiting the rest of __alloc_pages_direct_compact()
> where the return value is dictated by whether page is NULL or non-NULL.
> We can't leak a captured page if we are testing for it being NULL or
> non-NULL, which is what the rest of __alloc_pages_direct_compact() does
> *before* your change. So the idea was to add a check the page is actually
> NULL here since you are now relying on the return value of
> compact_zone_order() to be COMPACT_SKIPPED to infer page == NULL.
>
> I agree that's currently true in the code, I was trying to catch any
> errors where current->capture_control.page was non-NULL but
> try_to_compact_pages() returns COMPACT_SKIPPED. There's some complexity
> here.
>
> So my idea was the expand this out to:
>
> if (*compact_result == COMPACT_SKIPPED) {
> VM_BUG_ON(page);
> return NULL;
> }

Note that this may indeed actually trigger due to free page capture, when an IRQ
handler frees the page. See commit b9e20f0da1f5 ("mm, compaction: make capture
control handling safe wrt interrupts") describing how this was happening for
Hugh. So, this VM_BUG_ON() would sooner or later trigger.

It's because while compact_zone() does detect a successful capture and return
COMPACT_SUCCESS, the IRQ-capture can also happen later without being detected -
at any moment until compact_zone_order() resets the current->capture_control to
NULL. And at that point it may be already poised to return COMPACT_SKIPPED.

It might be cleanest to check *capture at the end of compact_zone_order() and
return COMPACT_SUCCESS when non-NULL. Technically it might be not true that
compaction was successful (we were just lucky that IRQ came and freed the page),
but not much harm in that. Better than e.g. the danger of leaking the captured
page which the proposed patch would do due to the shortcut.
The minor downside is that you would count a stall that wasn't really a stall in
case we skipped compaction, but captured a page by luck, but it would be very rare.