Re: [PATCH v8 00/10] Multi-size THP for anonymous memory

From: Ryan Roberts
Date: Tue Dec 05 2023 - 06:13:28 EST


On 05/12/2023 03:37, John Hubbard wrote:
> On 12/4/23 02:20, Ryan Roberts wrote:
>> Hi All,
>>
>> A new week, a new version, a new name... This is v8 of a series to implement
>> multi-size THP (mTHP) for anonymous memory (previously called "small-sized THP"
>> and "large anonymous folios"). Matthew objected to "small huge" so hopefully
>> this fares better.
>>
>> The objective of this is to improve performance by allocating larger chunks of
>> memory during anonymous page faults:
>>
>> 1) Since SW (the kernel) is dealing with larger chunks of memory than base
>>     pages, there are efficiency savings to be had; fewer page faults, batched PTE
>>     and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel
>>     overhead. This should benefit all architectures.
>> 2) Since we are now mapping physically contiguous chunks of memory, we can take
>>     advantage of HW TLB compression techniques. A reduction in TLB pressure
>>     speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce
>>     TLB entries; "the contiguous bit" (architectural) and HPA (uarch).
>>
>> This version changes the name and tidies up some of the kernel code and test
>> code, based on feedback against v7 (see change log for details).
>
> Using a couple of Armv8 systems, I've tested this patchset. I applied it
> to top of tree (Linux 6.7-rc4), on top of your latest contig pte series
> [1].
>
> With those two patchsets applied, the mm selftests look OK--or at least
> as OK as they normally do. I compared test runs between THP/mTHP set to
> "always", vs "never", to verify that there were no new test failures.
> Details: specifically, I set one particular page size (2 MB) to
> "inherit", and then toggled /sys/kernel/mm/transparent_hugepage/enabled
> between "always" and "never".

Excellent - I'm guessing this was for 64K base pages?

>
> I also re-ran my usual compute/AI benchmark, and I'm still seeing the
> same 10x performance improvement that I reported for the v6 patchset.
>
> So for this patchset and for [1] as well, please feel free to add:
>
> Tested-by: John Hubbard <jhubbard@xxxxxxxxxx>

Thanks!

>
>
> [1] https://lore.kernel.org/all/20231204105440.61448-1-ryan.roberts@xxxxxxx/
>
>
> thanks,