Re: [RFC 03/11] khugepaged: Don't allocate khugepaged mm_slot early

From: Dev Jain
Date: Fri Jan 10 2025 - 01:14:46 EST




On 09/01/25 5:01 am, Nico Pache wrote:
We should only "enter"/allocate the khugepaged mm_slot if we succeed at
allocating the PMD sized folio. Move the khugepaged_enter_vma call until
after we know the vma_alloc_folio was successful.

Why? We have the appropriate checks from thp_vma_allowable_orders() and friends, so the VMA should be registered with khugepaged irrespective of
whether during fault time we are able to allocate a PMD-THP or not. If we fail at fault time, it is the job of khugepaged to try to collapse it later.

Signed-off-by: Nico Pache <npache@xxxxxxxxxx>
---
mm/huge_memory.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e53d83b3e5cf..635c65e7ef63 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1323,7 +1323,6 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
ret = vmf_anon_prepare(vmf);
if (ret)
return ret;
- khugepaged_enter_vma(vma, vma->vm_flags);
if (!(vmf->flags & FAULT_FLAG_WRITE) &&
!mm_forbids_zeropage(vma->vm_mm) &&
@@ -1365,7 +1364,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
}
return ret;
}
-
+ khugepaged_enter_vma(vma, vma->vm_flags);
return __do_huge_pmd_anonymous_page(vmf);
}

In any case, you are not achieving what you described in the patch description: you have moved khugepaged_enter_vma() after the read fault logic, what you want to do is to move it after vma_alloc_anon_folio_pmd() in __do_huge_pmd_anonymous_page().