Re: [RFC PATCH v1 0/4] Control folio sizes used for page cache memory

From: Ryan Roberts
Date: Thu Aug 08 2024 - 06:27:29 EST


On 17/07/2024 08:12, Ryan Roberts wrote:
> Hi All,
>
> This series is an RFC that adds sysfs and kernel cmdline controls to configure
> the set of allowed large folio sizes that can be used when allocating
> file-memory for the page cache. As part of the control mechanism, it provides
> for a special-case "preferred folio size for executable mappings" marker.
>
> I'm trying to solve 2 separate problems with this series:
>
> 1. Reduce pressure in iTLB and improve performance on arm64: This is a modified
> approach for the change at [1]. Instead of hardcoding the preferred executable
> folio size into the arch, user space can now select it. This decouples the arch
> code and also makes the mechanism more generic; it can be bypassed (the default)
> or any folio size can be set. For my use case, 64K is preferred, but I've also
> heard from Willy of a use case where putting all text into 2M PMD-sized folios
> is preferred. This approach avoids the need for synchonous MADV_COLLAPSE (and
> therefore faulting in all text ahead of time) to achieve that.

Just a polite bump on this; I'd really like to get something like this merged to
help reduce iTLB pressure. We had a discussion at the THP Cabal meeting a few
weeks back without solid conclusion. I haven't heard any concrete objections
yet, but also only a luke-warm reception. How can I move this forwards?

Thanks,
Ryan


>
> 2. Reduce memory fragmentation in systems under high memory pressure (e.g.
> Android): The theory goes that if all folios are 64K, then failure to allocate a
> 64K folio should become unlikely. But if the page cache is allocating lots of
> different orders, with most allocations having an order below 64K (as is the
> case today) then ability to allocate 64K folios diminishes. By providing control
> over the allowed set of folio sizes, we can tune to avoid crucial 64K folio
> allocation failure. Additionally I've heard (second hand) of the need to disable
> large folios in the page cache entirely due to latency concerns in some
> settings. These controls allow all of this without kernel changes.
>
> The value of (1) is clear and the performance improvements are documented in
> patch 2. I don't yet have any data demonstrating the theory for (2) since I
> can't reproduce the setup that Barry had at [2]. But my view is that by adding
> these controls we will enable the community to explore further, in the same way
> that the anon mTHP controls helped harden the understanding for anonymous
> memory.
>
> ---
> This series depends on the "mTHP allocation stats for file-backed memory" series
> at [3], which itself applies on top of yesterday's mm-unstable (650b6752c8a3). All
> mm selftests have been run; no regressions were observed.
>
> [1] https://lore.kernel.org/linux-mm/20240215154059.2863126-1-ryan.roberts@xxxxxxx/
> [2] https://www.youtube.com/watch?v=ht7eGWqwmNs&list=PLbzoR-pLrL6oj1rVTXLnV7cOuetvjKn9q&index=4
> [3] https://lore.kernel.org/linux-mm/20240716135907.4047689-1-ryan.roberts@xxxxxxx/
>
> Thanks,
> Ryan
>
> Ryan Roberts (4):
> mm: mTHP user controls to configure pagecache large folio sizes
> mm: Introduce "always+exec" for mTHP file_enabled control
> mm: Override mTHP "enabled" defaults at kernel cmdline
> mm: Override mTHP "file_enabled" defaults at kernel cmdline
>
> .../admin-guide/kernel-parameters.txt | 16 ++
> Documentation/admin-guide/mm/transhuge.rst | 66 +++++++-
> include/linux/huge_mm.h | 61 ++++---
> mm/filemap.c | 26 ++-
> mm/huge_memory.c | 158 +++++++++++++++++-
> mm/readahead.c | 43 ++++-
> 6 files changed, 329 insertions(+), 41 deletions(-)
>
> --
> 2.43.0
>