Re: [RFC PATCH 02/12] khugepaged: Generalize alloc_charge_folio()

From: Dev Jain
Date: Tue Dec 17 2024 - 04:07:25 EST



On 17/12/24 12:23 pm, Ryan Roberts wrote:
On 16/12/2024 16:50, Dev Jain wrote:
Pass order to alloc_charge_folio() and update mTHP statistics.

Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
---
include/linux/huge_mm.h | 2 ++
mm/huge_memory.c | 4 ++++
mm/khugepaged.c | 13 +++++++++----
3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 93e509b6c00e..8b6d0fed99b3 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -119,6 +119,8 @@ enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_ALLOC,
MTHP_STAT_ANON_FAULT_FALLBACK,
MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
+ MTHP_STAT_ANON_COLLAPSE_ALLOC,
+ MTHP_STAT_ANON_COLLAPSE_ALLOC_FAILED,
MTHP_STAT_ZSWPOUT,
MTHP_STAT_SWPIN,
MTHP_STAT_SWPIN_FALLBACK,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2da5520bfe24..2e582fad4c77 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -615,6 +615,8 @@ static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
+DEFINE_MTHP_STAT_ATTR(anon_collapse_alloc, MTHP_STAT_ANON_COLLAPSE_ALLOC);
+DEFINE_MTHP_STAT_ATTR(anon_collapse_alloc_failed, MTHP_STAT_ANON_COLLAPSE_ALLOC_FAILED);
DEFINE_MTHP_STAT_ATTR(zswpout, MTHP_STAT_ZSWPOUT);
DEFINE_MTHP_STAT_ATTR(swpin, MTHP_STAT_SWPIN);
DEFINE_MTHP_STAT_ATTR(swpin_fallback, MTHP_STAT_SWPIN_FALLBACK);
@@ -636,6 +638,8 @@ static struct attribute *anon_stats_attrs[] = {
&anon_fault_alloc_attr.attr,
&anon_fault_fallback_attr.attr,
&anon_fault_fallback_charge_attr.attr,
+ &anon_collapse_alloc_attr.attr,
+ &anon_collapse_alloc_failed_attr.attr,
#ifndef CONFIG_SHMEM
&zswpout_attr.attr,
&swpin_attr.attr,
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 95643e6e5f31..02cd424b8e48 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1073,21 +1073,26 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
}
static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm,
- struct collapse_control *cc)
+ int order, struct collapse_control *cc)
{
gfp_t gfp = (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() :
GFP_TRANSHUGE);
int node = hpage_collapse_find_target_node(cc);
struct folio *folio;
- folio = __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask);
+ folio = __folio_alloc(gfp, order, node, &cc->alloc_nmask);
if (!folio) {
*foliop = NULL;
count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
+ if (order != HPAGE_PMD_ORDER)
+ count_mthp_stat(order, MTHP_STAT_ANON_COLLAPSE_ALLOC_FAILED);
Bug? We should be calling count_mthp_stat() for all orders, but only calling
count_vm_event(THP_*) for PMD_ORDER, as per pattern laid out by other mTHP stats.

Ah okay.


The aim is for existing THP stats (which are implicitly only counting PMD-sized
THP) to continue only to count PMD-sized THP. It's a userspace ABI and we were
scared of the potential to break things if we changed the existing counters'
semantics.

return SCAN_ALLOC_HUGE_PAGE_FAIL;
}
count_vm_event(THP_COLLAPSE_ALLOC);
+ if (order != HPAGE_PMD_ORDER)
+ count_mthp_stat(order, MTHP_STAT_ANON_COLLAPSE_ALLOC);
Same problem.

Also, I agree with Baolin that we don't want "anon" in the title. This is a
generic path used for file-backed memory. So once you fix the bug, the new stats
will also be counting the file-backed memory too (although for now, only for
PMD_ORDER).

Sure.

+
if (unlikely(mem_cgroup_charge(folio, mm, gfp))) {
folio_put(folio);
*foliop = NULL;
@@ -1124,7 +1129,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
*/
mmap_read_unlock(mm);
- result = alloc_charge_folio(&folio, mm, cc);
+ result = alloc_charge_folio(&folio, mm, order, cc);
Where is order coming from? I'm guessing that's added later, so this patch won't
compile on it's own? Perhaps HPAGE_PMD_ORDER for now?

Okay yes, this won't compile on its own. I'll ensure sequential buildability next time.


if (result != SCAN_SUCCEED)
goto out_nolock;
@@ -1850,7 +1855,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
- result = alloc_charge_folio(&new_folio, mm, cc);
+ result = alloc_charge_folio(&new_folio, mm, HPAGE_PMD_ORDER, cc);
if (result != SCAN_SUCCEED)
goto out;