Re: [PATCH mm-unstable v18 08/14] mm/khugepaged: add per-order mTHP collapse failure statistics

From: David Hildenbrand (Arm)

Date: Sun May 31 2026 - 16:09:59 EST

On 5/22/26 17:00, Nico Pache wrote:
> Add three new mTHP statistics to track collapse failures for different
> orders when encountering swap PTEs, excessive none PTEs, and shared PTEs:
>
> - collapse_exceed_swap_pte: Increment when mTHP collapse fails due to
> encountering a swap PTE.
>
> - collapse_exceed_none_pte: Counts when mTHP collapse fails due to
> exceeding the none PTE threshold for the given order
>
> - collapse_exceed_shared_pte: Counts when mTHP collapse fails due to
> encountering a shared PTE.
>
> These statistics complement the existing THP_SCAN_EXCEED_* events by
> providing per-order granularity for mTHP collapse attempts. The stats are
> exposed via sysfs under
> `/sys/kernel/mm/transparent_hugepage/hugepages-*/stats/` for each
> supported hugepage size.
>
> As we currently do not support collapsing mTHPs that contain a swap or
> shared entry, those statistics keep track of how often we are
> encountering failed mTHP collapses due to these restrictions.
>
> We will add support for mTHP collapse for anonymous pages next; lets also
> track when this happens at the PMD level within the per-mTHP stats.
>
> Signed-off-by: Nico Pache <npache@xxxxxxxxxx>
> ---
> Documentation/admin-guide/mm/transhuge.rst | 14 ++++++++++++++
> include/linux/huge_mm.h | 3 +++
> mm/huge_memory.c | 7 +++++++
> mm/khugepaged.c | 21 +++++++++++++++++++--
> 4 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index c51932e6275d..80a4d0bed70b 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -714,6 +714,20 @@ nr_anon_partially_mapped
> an anonymous THP as "partially mapped" and count it here, even though it
> is not actually partially mapped anymore.
>
> +collapse_exceed_none_pte
> + The number of collapse attempts that failed due to exceeding the
> + max_ptes_none threshold.
> +
> +collapse_exceed_swap_pte
> + The number of collapse attempts that failed due to exceeding the
> + max_ptes_swap threshold. For non-PMD orders this occurs if a mTHP range
> + contains at least one swap PTE.
> +
> +collapse_exceed_shared_pte
> + The number of collapse attempts that failed due to exceeding the
> + max_ptes_shared threshold. For non-PMD orders this occurs if a mTHP range
> + contains at least one shared PTE.
> +
> As the system ages, allocating huge pages may be expensive as the
> system uses memory compaction to copy data around memory to free a
> huge page for use. There are some counters in ``/proc/vmstat`` to help
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index ba7ae6808544..48496f09909b 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -144,6 +144,9 @@ enum mthp_stat_item {
> MTHP_STAT_SPLIT_DEFERRED,
> MTHP_STAT_NR_ANON,
> MTHP_STAT_NR_ANON_PARTIALLY_MAPPED,
> + MTHP_STAT_COLLAPSE_EXCEED_SWAP,
> + MTHP_STAT_COLLAPSE_EXCEED_NONE,
> + MTHP_STAT_COLLAPSE_EXCEED_SHARED,
> __MTHP_STAT_COUNT
> };
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 345c54133c83..5c128cdec810 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -703,6 +703,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
> DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
> DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON);
> DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED);
> +DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_SWAP);
> +DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_NONE);
> +DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEED_SHARED)

[...]

> /* See collapse_scan_pmd(). */
> if (folio_maybe_mapped_shared(folio)) {
> + /*
> + * TODO: Support shared pages without leading to further
> + * mTHP collapses. Currently bringing in new pages via
> + * shared may cause a future higher order collapse on a
> + * rescan of the same range.
> + */

This comment actually belongs into an earlier patch, no?

> if (++shared > max_ptes_shared) {
> result = SCAN_EXCEED_SHARED_PTE;
> - count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> + if (is_pmd_order(order))
> + count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED);
> goto out;
> }
> }
> @@ -1138,6 +1148,7 @@ static enum scan_result __collapse_huge_page_swapin(struct mm_struct *mm,
> * range.
> */
> if (!is_pmd_order(order)) {
> + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SWAP);
> pte_unmap(pte);
> mmap_read_unlock(mm);
> result = SCAN_EXCEED_SWAP_PTE;
> @@ -1433,6 +1444,8 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
> if (++none_or_zero > max_ptes_none) {
> result = SCAN_EXCEED_NONE_PTE;
> count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> + count_mthp_stat(HPAGE_PMD_ORDER,
> + MTHP_STAT_COLLAPSE_EXCEED_NONE);
> goto out_unmap;
> }
> continue;
> @@ -1441,6 +1454,8 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
> if (++unmapped > max_ptes_swap) {
> result = SCAN_EXCEED_SWAP_PTE;
> count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> + count_mthp_stat(HPAGE_PMD_ORDER,
> + MTHP_STAT_COLLAPSE_EXCEED_SWAP);
> goto out_unmap;
> }
> /*
> @@ -1498,6 +1513,8 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
> if (++shared > max_ptes_shared) {
> result = SCAN_EXCEED_SHARED_PTE;
> count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> + count_mthp_stat(HPAGE_PMD_ORDER,
> + MTHP_STAT_COLLAPSE_EXCEED_SHARED);

Can be done as a later cleanup, but having a single function that obtains an
order and knows which stats to update would be cleaner (and a good preparation
for shmem mTHP collapse support).

Nothing jumped at me, so

Acked-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>

--
Cheers,

David