Re: [PATCH v6 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop

From: Lance Yang
Date: Thu Jun 06 2024 - 05:39:13 EST


On Thu, Jun 6, 2024 at 4:06 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 06.06.24 10:01, David Hildenbrand wrote:
> > On 06.06.24 05:55, Lance Yang wrote:
> >> On Wed, Jun 5, 2024 at 10:28 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>>
> >>> On 05.06.24 16:20, Lance Yang wrote:
> >>>> Hi David,
> >>>>
> >>>> On Wed, Jun 5, 2024 at 8:46 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>>>>
> >>>>> On 21.05.24 06:02, Lance Yang wrote:
> >>>>>> In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
> >>>>>> folios, start the pagewalk first, then call split_huge_pmd_address() to
> >>>>>> split the folio.
> >>>>>>
> >>>>>> Since TTU_SPLIT_HUGE_PMD will no longer perform immediately, we might
> >>>>>> encounter a PMD-mapped THP missing the mlock in the VM_LOCKED range during
> >>>>>> the page walk. It’s probably necessary to mlock this THP to prevent it from
> >>>>>> being picked up during page reclaim.
> >>>>>>
> >>>>>> Suggested-by: David Hildenbrand <david@xxxxxxxxxx>
> >>>>>> Suggested-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> >>>>>> Signed-off-by: Lance Yang <ioworker0@xxxxxxxxx>
> >>>>>> ---
> >>>>>
> >>>>> [...] again, sorry for the late review.
> >>>>
> >>>> No worries at all, thanks for taking time to review!
> >>>>
> >>>>>
> >>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
> >>>>>> index ddffa30c79fb..08a93347f283 100644
> >>>>>> --- a/mm/rmap.c
> >>>>>> +++ b/mm/rmap.c
> >>>>>> @@ -1640,9 +1640,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> >>>>>> if (flags & TTU_SYNC)
> >>>>>> pvmw.flags = PVMW_SYNC;
> >>>>>>
> >>>>>> - if (flags & TTU_SPLIT_HUGE_PMD)
> >>>>>> - split_huge_pmd_address(vma, address, false, folio);
> >>>>>> -
> >>>>>> /*
> >>>>>> * For THP, we have to assume the worse case ie pmd for invalidation.
> >>>>>> * For hugetlb, it could be much worse if we need to do pud
> >>>>>> @@ -1668,20 +1665,35 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> >>>>>> mmu_notifier_invalidate_range_start(&range);
> >>>>>>
> >>>>>> while (page_vma_mapped_walk(&pvmw)) {
> >>>>>> - /* Unexpected PMD-mapped THP? */
> >>>>>> - VM_BUG_ON_FOLIO(!pvmw.pte, folio);
> >>>>>> -
> >>>>>> /*
> >>>>>> * If the folio is in an mlock()d vma, we must not swap it out.
> >>>>>> */
> >>>>>> if (!(flags & TTU_IGNORE_MLOCK) &&
> >>>>>> (vma->vm_flags & VM_LOCKED)) {
> >>>>>> /* Restore the mlock which got missed */
> >>>>>> - if (!folio_test_large(folio))
> >>>>>> + if (!folio_test_large(folio) ||
> >>>>>> + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)))
> >>>>>> mlock_vma_folio(folio, vma);

Should we still keep the '!pvmw.pte' here? Something like:

if (!folio_test_large(folio) || !pvmw.pte)
mlock_vma_folio(folio, vma);

We can mlock the THP to prevent it from being picked up during page reclaim.

David, I’d like to hear your thoughts on this ;)

Thanks,
Lance

> >>>>>
> >>>>> Can you elaborate why you think this would be required? If we would have
> >>>>> performed the split_huge_pmd_address() beforehand, we would still be
> >>>>> left with a large folio, no?
> >>>>
> >>>> Yep, there would still be a large folio, but it wouldn't be PMD-mapped.
> >>>>
> >>>> After Weifeng's series[1], the kernel supports mlock for PTE-mapped large
> >>>> folio, but there are a few scenarios where we don't mlock a large folio, such
> >>>> as when it crosses a VM_LOCKed VMA boundary.
> >>>>
> >>>> - if (!folio_test_large(folio))
> >>>> + if (!folio_test_large(folio) ||
> >>>> + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)))
> >>>>
> >>>> And this check is just future-proofing and likely unnecessary. If encountering a
> >>>> PMD-mapped THP missing the mlock for some reason, we can mlock this
> >>>> THP to prevent it from being picked up during page reclaim, since it is fully
> >>>> mapped and doesn't cross the VMA boundary, IIUC.
> >>>>
> >>>> What do you think?
> >>>> I would appreciate any suggestions regarding this check ;)
> >>>
> >>> Reading this patch only, I wonder if this change makes sense in the
> >>> context here.
> >>
> >> Allow me to try explaining it again ;)
> >>
> >>>
> >>> Before this patch, we would have PTE-mapped the PMD-mapped THP before
> >>> reaching this call and skipped it due to "!folio_test_large(folio)".
> >>
> >> Yes, there is only a PTE-mapped THP when doing the "!folio_test_large(folio)"
> >> check, as we will first conditionally split the PMD via
> >> split_huge_pmd_address().
> >>
> >>>
> >>> After this patch, we either
> >>
> >> Things will change. We'll first do the "!folio_test_large(folio)" check, then
> >> conditionally split the PMD via split_huge_pmd_address().
> >>
> >>>
> >>> a) PTE-remap the THP after this check, but retry and end-up here again,
> >>> whereby we would skip it due to "!folio_test_large(folio)".
> >>
> >> Hmm...
> >>
> >> IIUC, we will skip it after this check, stop the page walk, and not
> >> PTE-remap the THP.
> >>
> >>>
> >>> b) Discard the PMD-mapped THP due to lazyfree directly. Can that
> >>> co-exist with mlock and what would be the problem here with mlock?
> >>
> >> Before discarding a PMD-mapped THP as a whole, as patch #3 did,
> >> we also perform the "!folio_test_large(folio)" check. If the THP coexists
> >> with mlock, we will skip it, stop the page walk, and not discard it. IIUC.
> >
> > But "!folio_test_large(folio)" would *skip* the THP and not consider it
> > regarding mlock.
> >
> > I'm probably missing something
>
> I'm stupid, I missed that we still do the "goto walk_done_err;", only
> that we don't do the mlock_vma_folio(folio, vma);
>
> Yes, let's drop it for now! :)
>
> --
> Cheers,
>
> David / dhildenb
>