Re: [PATCH mm-unstable v15 13/13] Documentation: mm: update the admin guide for mTHP collapse

From: Nico Pache

Date: Wed Mar 18 2026 - 15:09:35 EST




On 3/17/26 5:02 AM, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Feb 25, 2026 at 08:27:06PM -0700, Nico Pache wrote:
>> Now that we can collapse to mTHPs lets update the admin guide to
>> reflect these changes and provide proper guidance on how to utilize it.
>>
>> Reviewed-by: Bagas Sanjaya <bagasdotme@xxxxxxxxx>
>> Signed-off-by: Nico Pache <npache@xxxxxxxxxx>
>
> LGTM, but maybe we should mention somewhere about mTHP's max_ptes_none
> behaviour?

IIRC we decided to strictly leave that out of the manual. I used to have it in
here. @david?

>
> Anyway with that addressed:
>
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@xxxxxxxxxx>
>
>> ---
>> Documentation/admin-guide/mm/transhuge.rst | 48 +++++++++++++---------
>> 1 file changed, 28 insertions(+), 20 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>> index eebb1f6bbc6c..67836c683e8d 100644
>> --- a/Documentation/admin-guide/mm/transhuge.rst
>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>> @@ -63,7 +63,8 @@ often.
>> THP can be enabled system wide or restricted to certain tasks or even
>> memory ranges inside task's address space. Unless THP is completely
>> disabled, there is ``khugepaged`` daemon that scans memory and
>> -collapses sequences of basic pages into PMD-sized huge pages.
>> +collapses sequences of basic pages into huge pages of either PMD size
>> +or mTHP sizes, if the system is configured to do so.
>>
>> The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
>> interface and using madvise(2) and prctl(2) system calls.
>> @@ -219,10 +220,10 @@ this behaviour by writing 0 to shrink_underused, and enable it by writing
>> echo 0 > /sys/kernel/mm/transparent_hugepage/shrink_underused
>> echo 1 > /sys/kernel/mm/transparent_hugepage/shrink_underused
>>
>> -khugepaged will be automatically started when PMD-sized THP is enabled
>> +khugepaged will be automatically started when any THP size is enabled
>> (either of the per-size anon control or the top-level control are set
>> to "always" or "madvise"), and it'll be automatically shutdown when
>> -PMD-sized THP is disabled (when both the per-size anon control and the
>> +all THP sizes are disabled (when both the per-size anon control and the
>> top-level control are "never")
>>
>> process THP controls
>> @@ -264,11 +265,6 @@ support the following arguments::
>> Khugepaged controls
>> -------------------
>>
>> -.. note::
>> - khugepaged currently only searches for opportunities to collapse to
>> - PMD-sized THP and no attempt is made to collapse to other THP
>> - sizes.
>> -
>> khugepaged runs usually at low frequency so while one may not want to
>> invoke defrag algorithms synchronously during the page faults, it
>> should be worth invoking defrag at least in khugepaged. However it's
>> @@ -296,11 +292,11 @@ allocation failure to throttle the next allocation attempt::
>> The khugepaged progress can be seen in the number of pages collapsed (note
>> that this counter may not be an exact count of the number of pages
>> collapsed, since "collapsed" could mean multiple things: (1) A PTE mapping
>> -being replaced by a PMD mapping, or (2) All 4K physical pages replaced by
>> -one 2M hugepage. Each may happen independently, or together, depending on
>> -the type of memory and the failures that occur. As such, this value should
>> -be interpreted roughly as a sign of progress, and counters in /proc/vmstat
>> -consulted for more accurate accounting)::
>> +being replaced by a PMD mapping, or (2) physical pages replaced by one
>> +hugepage of various sizes (PMD-sized or mTHP). Each may happen independently,
>> +or together, depending on the type of memory and the failures that occur.
>> +As such, this value should be interpreted roughly as a sign of progress,
>> +and counters in /proc/vmstat consulted for more accurate accounting)::
>>
>> /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed
>>
>> @@ -308,16 +304,19 @@ for each pass::
>>
>> /sys/kernel/mm/transparent_hugepage/khugepaged/full_scans
>>
>> -``max_ptes_none`` specifies how many extra small pages (that are
>> -not already mapped) can be allocated when collapsing a group
>> -of small pages into one large page::
>> +``max_ptes_none`` specifies how many empty (none/zero) pages are allowed
>> +when collapsing a group of small pages into one large page::
>>
>> /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
>>
>> -A higher value leads to use additional memory for programs.
>> -A lower value leads to gain less thp performance. Value of
>> -max_ptes_none can waste cpu time very little, you can
>> -ignore it.
>> +For PMD-sized THP collapse, this directly limits the number of empty pages
>> +allowed in the 2MB region. For mTHP collapse, only 0 or (HPAGE_PMD_NR - 1)
>> +are supported. Any other value will emit a warning and no mTHP collapse
>> +will be attempted.
>> +
>> +A higher value allows more empty pages, potentially leading to more memory
>> +usage but better THP performance. A lower value is more conservative and
>> +may result in fewer THP collapses.
>>
>> ``max_ptes_swap`` specifies how many pages can be brought in from
>> swap when collapsing a group of pages into a transparent huge page::
>> @@ -337,6 +336,15 @@ that THP is shared. Exceeding the number would block the collapse::
>>
>> A higher value may increase memory footprint for some workloads.
>>
>> +.. note::
>> + For mTHP collapse, khugepaged does not support collapsing regions that
>> + contain shared or swapped out pages, as this could lead to continuous
>> + promotion to higher orders. The collapse will fail if any shared or
>> + swapped PTEs are encountered during the scan.
>> +
>> + Currently, madvise_collapse only supports collapsing to PMD-sized THPs
>> + and does not attempt mTHP collapses.
>> +
>> Boot parameters
>> ===============
>>
>> --
>> 2.53.0
>>
>