Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64

From: Ryan Roberts
Date: Tue Oct 15 2024 - 07:49:06 EST


On 14/10/2024 18:32, Florian Fainelli wrote:
> On 10/14/24 03:55, Ryan Roberts wrote:
>> Hi All,
>>
>> Patch bomb incoming... This covers many subsystems, so I've included a core set
>> of people on the full series and additionally included maintainers on relevant
>> patches. I haven't included those maintainers on this cover letter since the
>> numbers were far too big for it to work. But I've included a link to this cover
>> letter on each patch, so they can hopefully find their way here. For follow up
>> submissions I'll break it up by subsystem, but for now thought it was important
>> to show the full picture.
>>
>> This RFC series implements support for boot-time page size selection within the
>> arm64 kernel. arm64 supports 3 base page sizes (4K, 16K, 64K), but to date, page
>> size has been selected at compile-time, meaning the size is baked into a given
>> kernel image. As use of larger-than-4K page sizes become more prevalent this
>> starts to present a problem for distributions. Boot-time page size selection
>> enables the creation of a single kernel image, which can be told which page size
>> to use on the kernel command line.
>>
>> Why is having an image-per-page size problematic?
>> =================================================
>>
>> Many traditional distros are now supporting both 4K and 64K. And this means
>> managing 2 kernel packages, along with drivers for each. For some, it means
>> multiple installer flavours and multiple ISOs. All of this adds up to a
>> less-than-ideal level of complexity. Additionally, Android now supports 4K and
>> 16K kernels. I'm told having to explicitly manage their KABI for each kernel is
>> painful, and the extra flash space required for both kernel images and the
>> duplicated modules has been problematic. Boot-time page size selection solves
>> all of this.
>>
>> Additionally, in starting to think about the longer term deployment story for
>> D128 page tables, which Arm architecture now supports, a lot of the same
>> problems need to be solved, so this work sets us up nicely for that.
>>
>> So what's the down side?
>> ========================
>>
>> Well nothing's free; Various static allocations in the kernel image must be
>> sized for the worst case (largest supported page size), so image size is in line
>> with size of 64K compile-time image. So if you're interested in 4K or 16K, there
>> is a slight increase to the image size. But I expect that problem goes away if
>> you're compressing the image - its just some extra zeros. At boot-time, I expect
>> we could free the unused static storage once we know the page size - although
>> that would be a follow up enhancement.
>>
>> And then there is performance. Since PAGE_SIZE and friends are no longer
>> compile-time constants, we must look up their values and do arithmetic at
>> runtime instead of compile-time. My early perf testing suggests this is
>> inperceptible for real-world workloads, and only has small impact on
>> microbenchmarks - more on this below.
>>
>> Approach
>> ========
>>
>> The basic idea is to rid the source of any assumptions that PAGE_SIZE and
>> friends are compile-time constant, but in a way that allows the compiler to
>> perform the same optimizations as was previously being done if they do turn out
>> to be compile-time constant. Where constants are required, we use limits;
>> PAGE_SIZE_MIN and PAGE_SIZE_MAX. See commit log in patch 1 for full description
>> of all the classes of problems to solve.
>>
>> By default PAGE_SIZE_MIN=PAGE_SIZE_MAX=PAGE_SIZE. But an arch may opt-in to
>> boot-time page size selection by defining PAGE_SIZE_MIN & PAGE_SIZE_MAX. arm64
>> does this if the user selects the CONFIG_ARM64_BOOT_TIME_PAGE_SIZE Kconfig,
>> which is an alternative to selecting a compile-time page size.
>>
>> When boot-time page size is active, the arch pgtable geometry macro definitions
>> resolve to something that can be configured at boot. The arm64 implementation in
>> this series mainly uses global, __ro_after_init variables. I've tried using
>> alternatives patching, but that performs worse than loading from memory; I think
>> due to code size bloat.
>
> FWIW, this paragraph was not entirely clear to me until I looked at patch 57 to
> see that the compile time page size selection had been retained, and could
> continue to be used as-is. It was somewhat implicit, but not IMHO explicit
> enough, not a big deal though.

I intended to make that bit clear with the above sentance "arm64 does this if
the user selects the CONFIG_ARM64_BOOT_TIME_PAGE_SIZE Kconfig, which is an
alternative to selecting a compile-time page size.", but appreciate there is a
lot going on here.

>
> Great work, thanks for doing that! This makes me wonder if we could leverage any
> of that to have a single kernel supporting both LPAE and !LPAE on ARM 32-bit,
> but that still seems like somewhat more difficult, largely due to the difference
> in the page table descriptor format (long vs. short).

We will eventually have the exact same problem with FEAT_D128 on arm64. This
introduces page tables with 128 bit PTEs. Ideally we would like to support both
in a single image, although, we have much more thinking to do on that. But my
current view is that this series solves a bunch of problems that makes it easier
(PTRS_PER_Pxx and Pxx_SHIFT all become boot-time values, for example, so we can
easily represent the different geometries).

Yes, we still need to solve the PTE size difference (in our case 64-bit vs
128-bit). I have a couple of proposals for how to do that; the "gold-plated"
approach would be to create and use a handle type to represent a PTE/PxD slot in
a table. Then increments/decrements would be enforced via explicit helpers that
know the size, and direct dereferencing would be impossible. When accessing via
helpers we would pass around pte_t/pxd_t values that are the larger size, then
narrow then when writing back.

Anshuman has a series [1] that starts to move in that direction.

If you have any other ideas, it would be good to talk!

[1]
https://lore.kernel.org/linux-mm/20240917073117.1531207-1-anshuman.khandual@xxxxxxx/

Thanks,
Ryan