On Thu, Jul 31, 2025 at 01:27:19PM +0100, Usama Arif wrote:
From: David Hildenbrand <david@xxxxxxxxxx>
Describing the context through a type is much clearer, and good enough
for our case.
This is pretty bare bones. What context, what type? Under what
circumstances?
This also is missing detail on the key difference here - that actually it
turns out we _don't_ need these to be flags, rather we can have _distinct_
modes which are clearer.
I'd say something like:
when determining which THP orders are eligiible for a VMA mapping,
we have previously specified tva_flags, however it turns out it is
really not necessary to treat these as flags.
Rather, we distinguish between distinct modes.
The only case where we previously combined flags was with
TVA_ENFORCE_SYSFS, but we can avoid this by observing that this is
the default, except for MADV_COLLAPSE or an edge cases in
collapse_pte_mapped_thp() and hugepage_vma_revalidate(), and adding
a mode specifically for this case - TVA_FORCED_COLLAPSE.
... stuff about the different modes...
We have:
* smaps handling for showing "THPeligible"
* Pagefault handling
* khugepaged handling
* Forced collapse handling: primarily MADV_COLLAPSE, but one other odd case
Can we actually state what this case is? I mean I guess a handwave in the
form of 'an edge case in collapse_pte_mapped_thp()' will do also.
Hmm actually we do weird stuff with this so maybe just handwave.
Like uprobes calls collapse_pte_mapped_thp()... :/ I'm not sure this 'If we
are here, we've succeeded in replacing all the native pages in the page
cache with a single hugepage.' comment is even correct.
Really, we want to ignore sysfs only when we are forcing a collapse
through MADV_COLLAPSE, otherwise we want to enforce.
I'd say 'ignoring this edge case, ...'
I think the clearest thing might be to literally list the before/after
like:
* TVA_SMAPS | TVA_ENFORCE_SYSFS -> TVA_SMAPS
* TVA_IN_PF | TVA_ENFORCE_SYSFS -> TVA_PAGEFAULT
* TVA_ENFORCE_SYSFS -> TVA_KHUGEPAGED
* 0 -> TVA_FORCED_COLLAPSE
With this change, we immediately know if we are in the forced collapse
case, which will be valuable next.
Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
Acked-by: Usama Arif <usamaarif642@xxxxxxxxx>
Signed-off-by: Usama Arif <usamaarif642@xxxxxxxxx>
Overall this is a great cleanup, some various nits however.
---
fs/proc/task_mmu.c | 4 ++--
include/linux/huge_mm.h | 30 ++++++++++++++++++------------
mm/huge_memory.c | 8 ++++----
mm/khugepaged.c | 18 +++++++++---------
mm/memory.c | 14 ++++++--------
5 files changed, 39 insertions(+), 35 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3d6d8a9f13fc..d440df7b3d59 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1293,8 +1293,8 @@ static int show_smap(struct seq_file *m, void *v)
__show_smap(m, &mss, false);
seq_printf(m, "THPeligible: %8u\n",
- !!thp_vma_allowable_orders(vma, vma->vm_flags,
- TVA_SMAPS | TVA_ENFORCE_SYSFS, THP_ORDERS_ALL));
+ !!thp_vma_allowable_orders(vma, vma->vm_flags, TVA_SMAPS,
+ THP_ORDERS_ALL));
This !! is so gross, wonder if we could have a bool wrapper. But not a big
deal.
I also sort of _hate_ the smaps flag anyway, invoking this 'allowable
orders' thing just for smaps reporting with maybe some minor delta is just
odd.
Something like `bool vma_has_thp_allowed_orders(struct vm_area_struct
*vma);` would be nicer.
Anyway thoughts for another time... :)
if (arch_pkeys_enabled())
seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma));
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 71db243a002e..b0ff54eee81c 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -94,12 +94,15 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
#define THP_ORDERS_ALL \
(THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL | THP_ORDERS_ALL_FILE_DEFAULT)
-#define TVA_SMAPS (1 << 0) /* Will be used for procfs */
Dumb question, but what does 'TVA' stand for? :P
-#define TVA_IN_PF (1 << 1) /* Page fault handler */
-#define TVA_ENFORCE_SYSFS (1 << 2) /* Obey sysfs configuration */
+enum tva_type {
+ TVA_SMAPS, /* Exposing "THPeligible:" in smaps. */
How I hate this flag (just an observation...)
+ TVA_PAGEFAULT, /* Serving a page fault. */
+ TVA_KHUGEPAGED, /* Khugepaged collapse. */
This is equivalent to the TVA_ENFORCE_SYSFS case before, sort of a default
I guess, but actually quite nice to add the context that it's sourced from
khugepaged - I assume this will always be the case when specified?
+ TVA_FORCED_COLLAPSE, /* Forced collapse (i.e., MADV_COLLAPSE). */
Would put 'e.g.' here, then that allows 'space' for the edge case...
+};
-#define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \
- (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order)))
+#define thp_vma_allowable_order(vma, vm_flags, type, order) \
+ (!!thp_vma_allowable_orders(vma, vm_flags, type, BIT(order)))
Nit, but maybe worth keeping tva_ prefix - tva_type - here just so it's
clear what type it refers to.
But not end of the world.
Same comment goes for param names below etc.
+++ b/mm/khugepaged.c
@@ -474,8 +474,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
{
if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
hugepage_pmd_enabled()) {
- if (thp_vma_allowable_order(vma, vm_flags, TVA_ENFORCE_SYSFS,
- PMD_ORDER))
+ if (thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER))
__khugepaged_enter(vma->vm_mm);
}
}
@@ -921,7 +920,8 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
struct collapse_control *cc)
{
struct vm_area_struct *vma;
- unsigned long tva_flags = cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0;
+ enum tva_type tva_type = cc->is_khugepaged ? TVA_KHUGEPAGED :
+ TVA_FORCED_COLLAPSE;
This is great, this is so much clearer.
A nit though, I mean I come back to my 'type' vs 'tva_type' nit above, this
is inconsistent, so we should choose one approach and stick with it.
if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
return SCAN_ANY_PROCESS;
@@ -932,7 +932,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
if (!thp_vma_suitable_order(vma, address, PMD_ORDER))
return SCAN_ADDRESS_RANGE;
- if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_flags, PMD_ORDER))
+ if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_type, PMD_ORDER))
return SCAN_VMA_CHECK;
/*
* Anon VMA expected, the address may be unmapped then
@@ -1532,9 +1532,10 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
* in the page cache with a single hugepage. If a mm were to fault-in
* this memory (mapped by a suitably aligned VMA), we'd get the hugepage
* and map it by a PMD, regardless of sysfs THP settings. As such, let's
- * analogously elide sysfs THP settings here.
+ * analogously elide sysfs THP settings here and pretend we are
+ * collapsing.
I think saying pretending here is potentially confusing, maybe worth saying
'force collapse'?