Re: [PATCH v3 2/9] mm/rmap: refactor hugetlb pte clearing in try_to_unmap_one

From: Dev Jain

Date: Mon May 11 2026 - 05:00:53 EST




On 11/05/26 12:40 pm, David Hildenbrand (Arm) wrote:
> On 5/6/26 11:44, Dev Jain wrote:
>> Simplify the code by refactoring the folio_test_hugetlb() branch into
>> a new function.
>>
>> While at it, convert BUG helpers to WARN helpers.
>>
>> Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
>> ---
>> mm/rmap.c | 117 ++++++++++++++++++++++++++++++++----------------------
>> 1 file changed, 69 insertions(+), 48 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index a5f067a09de0f..a98acdea0530a 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1978,6 +1978,68 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
>> FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY);
>> }
>>
>> +/* Returns false if unmap needs to be aborted */
>> +static inline bool unmap_hugetlb_folio(struct vm_area_struct *vma,
>
> I'm wondering whether we should make it clearer that this belongs to the
> try_to_unmap family by calling it
>
> ttu_hugetlb_folio

Yes I had suggested a ttu_ prefix somewhere else in the first version,
Lorenzo didn't like it (or probably he didn't like that specific use
of ttu):

https://lore.kernel.org/all/a8b06f36-98e1-435c-881f-67242bc4304a@lucifer.local/

Don't know about a better name other than "commit_ttu_lazyfree_folio" in
that case, but for the hugetlb case, I like ttu_hugetlb_folio.

>
>> + struct folio *folio, struct page_vma_mapped_walk *pvmw,
>> + struct page *page, enum ttu_flags flags, pte_t *pteval,
>> + struct mmu_notifier_range *range, bool *exit_walk)
>> +{
>> + /*
>> + * The try_to_unmap() is only passed a hugetlb page
>> + * in the case where the hugetlb page is poisoned.
>> + */
>> + VM_WARN_ON_PAGE(!PageHWPoison(page), page);
>
> IIRC, we will never actually get a tail page here.
>
> Can we avoid passing a page by checking instead whether the hugetlb folios is
> marked as having a poisoned page?
>
> See the folio_test_set_hwpoison() in hugetlb_update_hwpoison().
>
> So you can simply use folio_test_hwpoison here instead.

Okay I will confirm and do this.

>
>
>> + /*
>> + * huge_pmd_unshare may unmap an entire PMD page.
>> + * There is no way of knowing exactly which PMDs may
>> + * be cached for this mm, so we must flush them all.
>> + * start/end were already adjusted above to cover this
>> + * range.
>> + */
>> + flush_cache_range(vma, range->start, range->end);
>> +
>> + /*
>> + * To call huge_pmd_unshare, i_mmap_rwsem must be
>> + * held in write mode. Caller needs to explicitly
>> + * do this outside rmap routines.
>> + *
>> + * We also must hold hugetlb vma_lock in write mode.
>> + * Lock order dictates acquiring vma_lock BEFORE
>> + * i_mmap_rwsem. We can only try lock here and fail
>> + * if unsuccessful.
>> + */
>> + if (!folio_test_anon(folio)) {
>> + struct mmu_gather tlb;
>> +
>> + VM_WARN_ON(!(flags & TTU_RMAP_LOCKED));
>> + if (!hugetlb_vma_trylock_write(vma)) {
>> + *exit_walk = true;
>> + return false;
>> + }
>> +
>> + tlb_gather_mmu_vma(&tlb, vma);
>> + if (huge_pmd_unshare(&tlb, vma, pvmw->address, pvmw->pte)) {
>> + hugetlb_vma_unlock_write(vma);
>> + huge_pmd_unshare_flush(&tlb, vma);
>> + tlb_finish_mmu(&tlb);
>> + /*
>> + * The PMD table was unmapped,
>> + * consequently unmapping the folio.
>> + */
>> + *exit_walk = true;
>> + return true;
>> + }
>> + hugetlb_vma_unlock_write(vma);
>> + tlb_finish_mmu(&tlb);
>> + }
>> + *pteval = huge_ptep_clear_flush(vma, pvmw->address, pvmw->pte);
>> + if (pte_dirty(*pteval))
>> + folio_mark_dirty(folio);
>> +
>> + *exit_walk = false;
>> + return true;
>
>
> Can we instead introduce some enum that tells the caller how to proceed?
>
> I assume we have
>
> (a) Abort walk (ret = false + page_vma_mapped_walk_done())
>
> (b) Walk done (ret = true + page_vma_mapped_walk_done())
>
> (c) Continue walk (call page_vma_mapped_walk())
>
> enum ttu_walk_result {
> TTU_WALK_CONTINUE,
> TTU_WALK_ABORT,
> TTU_WALK_DONE
> }

I had replied to such a suggestion here:

https://lore.kernel.org/all/caa7c455-7472-48eb-a5dc-145e587d67ba@xxxxxxx/

Probably we don't have any other solution : )

>
>> +}
>> +
>> /*
>> * @arg: enum ttu_flags will be passed to this argument
>> */
>> @@ -2115,56 +2177,15 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>> PageAnonExclusive(subpage);
>>
>> if (folio_test_hugetlb(folio)) {
>> - bool anon = folio_test_anon(folio);
>> -
>> - /*
>> - * The try_to_unmap() is only passed a hugetlb page
>> - * in the case where the hugetlb page is poisoned.
>> - */
>> - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
>> - /*
>> - * huge_pmd_unshare may unmap an entire PMD page.
>> - * There is no way of knowing exactly which PMDs may
>> - * be cached for this mm, so we must flush them all.
>> - * start/end were already adjusted above to cover this
>> - * range.
>> - */
>> - flush_cache_range(vma, range.start, range.end);
>> + bool exit_walk;
>>
>> - /*
>> - * To call huge_pmd_unshare, i_mmap_rwsem must be
>> - * held in write mode. Caller needs to explicitly
>> - * do this outside rmap routines.
>> - *
>> - * We also must hold hugetlb vma_lock in write mode.
>> - * Lock order dictates acquiring vma_lock BEFORE
>> - * i_mmap_rwsem. We can only try lock here and fail
>> - * if unsuccessful.
>> - */
>> - if (!anon) {
>> - struct mmu_gather tlb;
>> -
>> - VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>> - if (!hugetlb_vma_trylock_write(vma))
>> - goto walk_abort;
>> -
>> - tlb_gather_mmu_vma(&tlb, vma);
>> - if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
>> - hugetlb_vma_unlock_write(vma);
>> - huge_pmd_unshare_flush(&tlb, vma);
>> - tlb_finish_mmu(&tlb);
>> - /*
>> - * The PMD table was unmapped,
>> - * consequently unmapping the folio.
>> - */
>> - goto walk_done;
>> - }
>> - hugetlb_vma_unlock_write(vma);
>> - tlb_finish_mmu(&tlb);
>> + ret = unmap_hugetlb_folio(vma, folio, &pvmw, subpage,
>> + flags, &pteval, &range,
>> + &exit_walk);
>> + if (exit_walk) {
>> + page_vma_mapped_walk_done(&pvmw);
>> + break;
>
> In the old walk_abort case you wouldn't set ret = false?

ret will be set appropriately in unmap_hugetlb_folio.
>
> When returning the enum you could simply do something like
>
> switch (ret) {
> case TTU_WALK_ABORT:
> goto walk_abort;
> case TTU_WALK_DONE:
> goto walk_done;
> default:
> break;
> }
>
>
> While I like this patch, can we please just move all the hugetlb shite into this
> helper function?
>
> Essentially, get rid of hugetlb special casing in the remainder of the function.
>
> That also makes the function name clearer (right now it's only doing a part of
> hugetlb folio unmapping).

Okay I can try that. That would mean splitting the pvmw walk for hugetlb and
non-hugetlb, but I suspect it would be very less code duplication.

>