[RFC 0/2] mm: introduce THP deferred setting

From: Nico Pache
Date: Mon Jul 29 2024 - 18:28:04 EST


We've seen cases were customers switching from RHEL7 to RHEL8 see a
significant increase in the memory footprint for the same workloads.

Through our investigations we found that a large contributing factor to
the increase in RSS was an increase in THP usage.

For workloads like MySQL, or when using allocators like jemalloc, it is
often recommended to set /transparent_hugepages/enabled=never. This is
in part due to performance degradations and increased memory waste.

This series introduces enabled=defer, this setting acts as a middle
ground between always and madvise. If the mapping is MADV_HUGEPAGE, the
page fault handler will act normally, making a hugepage if possible. If
the allocation is not MADV_HUGEPAGE, then the page fault handler will
default to the base size allocation. The caveat is that khugepaged can
still operate on pages thats not MADV_HUGEPAGE.

This allows for two things... one, applications specifically designed to
use hugepages will get them, and two, applications that don't use
hugepages can still benefit from them without aggressively inserting
THPs at every possible chance. This curbs the memory waste, and defers
the use of hugepages to khugepaged. Khugepaged can then scan the memory
for eligible collapsing.

Admins may want to lower max_ptes_none, if not, khugepaged may
aggressively collapse single allocations into hugepages.

RFC note
==========
Im not sure if im missing anything related to the mTHP
changes. I think now that we have hugepage_pmd_enabled in
commit 00f58104202c ("mm: fix khugepaged activation policy") everything
should work as expected.

Nico Pache (2):
mm: defer THP insertion to khugepaged
mm: document transparent_hugepage=defer usage

Documentation/admin-guide/mm/transhuge.rst | 18 ++++++++++---
include/linux/huge_mm.h | 15 +++++++++--
mm/huge_memory.c | 31 +++++++++++++++++++---
3 files changed, 55 insertions(+), 9 deletions(-)

Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: David Hildenbrand <david@xxxxxxxxxx>
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: Barry Song <baohua@xxxxxxxxxx>
Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
Cc: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
Cc: Lance Yang <ioworker0@xxxxxxxxx>
Cc: Peter Xu <peterx@xxxxxxxxxx>
Cc: Zi Yan <ziy@xxxxxxxxxx>
Cc: Rafael Aquini <aquini@xxxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Jonathan Corbet <corbet@xxxxxxx>
--
2.45.2