Re: [RFC] mm: restrict zero-page remapping to underused THP splits

From: David Hildenbrand (Arm)

Date: Mon May 11 2026 - 09:46:05 EST


On 5/11/26 15:10, Usama Arif wrote:
>
>
> On 11/05/2026 07:36, David Hildenbrand (Arm) wrote:
>>
>>>
>>> Hello!
>>
>>
>> Hi!
>>
>>>
>>> I think (3) definitely makes sense.
>>>
>>> I have not had a deep look at KSM up until just now, so might be dumb
>>> to say all of below.. :)
>>>
>>> What I see is that KSM scans THPs as 512 individual 4K subpages and splits the
>>> THP whenever it actually wants to merge a single 4K chunk. That seems like a
>>> lot of work for a single 4K?
>>
>> Yes, but that's what the users ask for: if there is a chance to deduplicate
>> memory, it shall be deduplicated asap.
>>
>>>
>>> One thing that came to my mind is to have a separate tree for THPs and only
>>> merge the THPs that have the same content, but the possibility of encoutering
>>> 2M pages with same content is extremely low? so this is probably a bad idea.
>>
>> Right, the probability is low, and it would change existing semantics, breaking
>> existing users.
>>
>> In addition, we would have to add large folio support for KSM, which I rather
>> would avoid.
>>
>>>
>>> An alternative is, does it even make sense to process and split THPs by KSM
>>> in the way it works now? IMO this is a lot of work for a single 4K merge.
>>> Shrinker is designed to release memory when its needed, i.e. reclaim, at
>>> which point IMO free memory is more important than performance. But KSM runs
>>> all the time.. so constantly splitting THPs everytime a single 4K can be
>>> merged just hurts performance all the time.
>>
>> Right, but that's what you get with KSM: bad performance if there is a chance to
>> deduplicate :)
>>
>> (and bad performance from scanning overhead)
>>
>>> If someone cares about memory,
>>> they should be running the shrinker.
>>
>> It's not just the zero page, but really any page content. The zero page is
>> currently only "special" after we added conditional support to deduplicate to
>> the shared zeropage in KSM. The shrinker doesn't help for any other page content
>> besides zero-filled.
>>
>> Further, the shrinker is something system-wide, whereby KSM is usually only
>> enabled for selected VMAs (with some exceptions nowadays).
>>
>> Also note that KSM deduplicates independent of the folio size: not just THPs,
>> but really any (large) folio. Yes, it splits large folios, but that's really
>> just to keep the T in THP.
>>
>>> Is a better alternative that KSM skips
>>> THPs, THP shrinker splits THPs into 4K subpages when memory is needed, and
>>> only then KSM gets those 4K subpages?
>>>
>>> Above sounds like reworking KSM, but just wanted to put it out there.
>>
>> Right, and it makes KSM more THP aware. Which is something I would avoid right now.
>>
>>>
>>> (2) + (3) sounds like a good solution, but I wonder if above alternative
>>> of KSM just skipping THPs might be better?
>>
>> That would change the semantics where, for example, where we expect that memory
>> was deduplicated after a KSM run.
>>
>> VMs (where KSM is usually employed) are expected to be mostly backed by THPs:
>> except where we can deduplicate memory. Skipping THPs would essentially break
>> the main use case for KSM :)
>>
>> Does that make sense?
>>
>
> Yes, all of above makes sense. But I feel like this means someone should not
> set THP policy to always and enable KSM together.

IIRC, QEMU will, as default, set MADV_HUGEPAGE and MADV_MERGEABLE :)

(KSM itself later has to be enabled manually on a system level)

> In general I feel like KSM
> is not something that should be run on big servers, as hopefully you are
> not managing memory as 4K chunks for big machines and using a lot of THPs.

Right. But the 4k chunks are movable and compaction can move them around to
create THPs elsewhere.

--
Cheers,

David