Re: [PATCH mm-unstable v15 11/13] mm/khugepaged: avoid unnecessary mTHP collapse attempts

From: Nico Pache

Date: Thu Feb 26 2026 - 15:48:46 EST


On Thu, Feb 26, 2026 at 9:27 AM Usama Arif <usama.arif@xxxxxxxxx> wrote:
>
> On Wed, 25 Feb 2026 20:26:31 -0700 Nico Pache <npache@xxxxxxxxxx> wrote:
>
> > There are cases where, if an attempted collapse fails, all subsequent
> > orders are guaranteed to also fail. Avoid these collapse attempts by
> > bailing out early.
> >
> > Signed-off-by: Nico Pache <npache@xxxxxxxxxx>
> > ---
> > mm/khugepaged.c | 35 ++++++++++++++++++++++++++++++++++-
> > 1 file changed, 34 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 1c3711ed4513..388d3f2537e2 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1492,9 +1492,42 @@ static int mthp_collapse(struct mm_struct *mm, unsigned long address,
> > ret = collapse_huge_page(mm, collapse_address, referenced,
> > unmapped, cc, mmap_locked,
> > order);
> > - if (ret == SCAN_SUCCEED) {
> > +
> > + switch (ret) {
> > + /* Cases were we continue to next collapse candidate */
> > + case SCAN_SUCCEED:
> > collapsed += nr_pte_entries;
> > + fallthrough;
> > + case SCAN_PTE_MAPPED_HUGEPAGE:
> > continue;
> > + /* Cases were lower orders might still succeed */
> > + case SCAN_LACK_REFERENCED_PAGE:
> > + case SCAN_EXCEED_NONE_PTE:
> > + case SCAN_EXCEED_SWAP_PTE:
> > + case SCAN_EXCEED_SHARED_PTE:
> > + case SCAN_PAGE_LOCK:
> > + case SCAN_PAGE_COUNT:
> > + case SCAN_PAGE_LRU:
> > + case SCAN_PAGE_NULL:
> > + case SCAN_DEL_PAGE_LRU:
> > + case SCAN_PTE_NON_PRESENT:
> > + case SCAN_PTE_UFFD_WP:
> > + case SCAN_ALLOC_HUGE_PAGE_FAIL:
> > + goto next_order;
> > + /* Cases were no further collapse is possible */
> > + case SCAN_CGROUP_CHARGE_FAIL:
>
> The only one that stands out to me is SCAN_CGROUP_CHARGE_FAIL. memcg charging
> of higher order folio might fail, but a lower order folio might pass?
> That said, if the cgroup is that tight, continuing collapse work may not
> be productive.
>
> Acked-by: Usama Arif <usama.arif@xxxxxxxxx>

Thanks! IIRC, David and I discussed all of these off chain to confirm
their placement. I had this in the 'next_order' case at some point and
David recommended it to "fail" for the same reason you state here:
collapsing or charging large order pages in such a tight cgroup is
likely unproductive and not worth the effort.

In contrast, SCAN_ALLOC_HUGE_PAGE_FAIL does not necessarily indicate a
resource constraint, but it could. We might fail to allocate an N-page
size due to fragmentation, but we could easily find an (N-1) size. We
could also have a scenario where a lack of memory causes the failure,
iterating all the way down, which would be unproductive. However, at
that point the OOM reaper should be active and the system will already
be cornered in multiple ways, so it should be ok.

Hopefully that gives some insight into the decisions made here :)

Cheers,
-- Nico

>
> > + case SCAN_COPY_MC:
> > + case SCAN_ADDRESS_RANGE:
> > + case SCAN_NO_PTE_TABLE:
> > + case SCAN_ANY_PROCESS:
> > + case SCAN_VMA_NULL:
> > + case SCAN_VMA_CHECK:
> > + case SCAN_SCAN_ABORT:
> > + case SCAN_PAGE_ANON:
> > + case SCAN_PMD_MAPPED:
> > + case SCAN_FAIL:
> > + default:
> > + return collapsed;
> > }
> > }
> >
> > --
> > 2.53.0
> >
> >
>