Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support
From: Nico Pache
Date: Tue Jun 09 2026 - 07:01:16 EST
On Tue, Jun 9, 2026 at 4:37 AM Lance Yang <lance.yang@xxxxxxxxx> wrote:
>
>
>
> On 2026/6/9 17:32, Nico Pache wrote:
> > On Tue, Jun 9, 2026 at 3:26 AM David Hildenbrand (Arm) <david@xxxxxxxxxx> wrote:
> >>
> >> On 6/9/26 11:06, Nico Pache wrote:
> >>> On Mon, Jun 8, 2026 at 8:57 AM David Hildenbrand (Arm) <david@xxxxxxxxxx> wrote:
> >>>>
> >>>> On 6/6/26 12:28, Lance Yang wrote:
> >>>>>
> >>>>>
> >>>>> Looks broken for swap PTEs in PMD collapse ...
> >>>>>
> >>>>> collapse_scan_pmd() allows them up to max_ptes_swap and record them in
> >>>>> unmapped, but they don't get a bit in mthp_present_ptes. And then
> >>>>> mthp_collapse() does the check above:
> >>>>
> >>>> Right. I assumed this is implicitly handled by the optimization in collapse_scan_pmd:
> >>>>
> >>>> if (enabled_orders != BIT(HPAGE_PMD_ORDER))
> >>>> max_ptes_none = KHUGEPAGED_MAX_PTES_LIMIT;
> >>>>
> >>>> But we perform the check a second time.
> >>>>
> >>>>>
> >>>>> nr_occupied_ptes >= nr_ptes - max_ptes_none
> >>>>>
> >>>>> So max_ptes_none=0 + 511 present PTEs + one allowed swap PTE won't even
> >>>>> call collapse_huge_page() for PMD order.
> >>>>>
> >>>>> Shouldn't we account for them in the PMD-order check? Something like:
> >>>>>
> >>>>> if (is_pmd_order(order))
> >>>>> nr_occupied_ptes += unmapped;
> >>>
> >>> This solution seems good for a temporary fixup. but longterm we may
> >>> want something else. I'm still not sure how we plan on supporting
> >>> swapin without causing creep. So I'd be ok with adding a fix for
> >>> legacy PMD behavior until we know how to handle mTHP creep correctly.
> >>>
> >>>> As an alternative, we could either 1) skip the check there for
> >>>> pmd order (as the check was already done); or 2) introduce+maintain
> >>>> a bitmap that tracks non-present PTEs.
> >>>>
> >>>> @@ -1475,7 +1477,9 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
> >>>> nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
> >>>> offset + nr_ptes);
> >>>>
> >>>> - if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
> >>>> + /* Check was already done in the caller. */
> >>>> + if (is_pmd_order(order) ||
> >>>> + nr_occupied_ptes >= nr_ptes - max_ptes_none) {
> >>>> enum scan_result ret;
> >>>>
> >>>> collapse_address = address + offset * PAGE_SIZE;
> >>>>
> >>>> 2) would probably be cleanest long-term.
> >>>
> >>> That would be best for future swapin support in mTHP, but I still
> >>> don't think it solves the creep issue.
> >>
> >> It wouldn't, we'd simply maintain the state we collect + rely on in separate
> >> bitmaps. On swapin, we'd have to update/refresh bitmaps I guess.
> >
> > Yeah, I'm saying for the future, it obviously solves this issue here
> > as well, but if we have positional tracking of the swapout, shared,
> > and none PTEs, I think we can use this to determine whether the
> > collapse would lead to creep. If we detect creep would happen it may
> > be best to automatically collapse to the N+1 (or greater) candidate.
> > Just thinking outloud here.
> >
> >>
> >>> Perhaps we could combine the
> >>> two bitmaps to determine if it would make the future collapse eligible
> >>> again? Not sure but ill start thinking about it.
> >>>
> >>> Should I send a fixup for this using Lance's solution? Or does Lance
> >>> want to send a patch out with the fixes tag?
> >>
> >> If Lance could send a fixup, explaining the situation, that would be nice.
>
> Sure, happy to send a fixup :P
>
> Should I send it as a fixup to be folded into this patch, or as a
> separate patch with a Fixes: tag?
Id assume a seperate patch so you can keep credit for the discovery :)
Thank you for all the review you provided on this series, its been
really helpful!
-- Nico
>
> Will get one out soon :)
>
> > OK, I'd appreciate that :)
>
> Cheers!
>