Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64

From: Florian Fainelli
Date: Mon Oct 14 2024 - 13:33:05 EST


On 10/14/24 03:55, Ryan Roberts wrote:
Hi All,

Patch bomb incoming... This covers many subsystems, so I've included a core set
of people on the full series and additionally included maintainers on relevant
patches. I haven't included those maintainers on this cover letter since the
numbers were far too big for it to work. But I've included a link to this cover
letter on each patch, so they can hopefully find their way here. For follow up
submissions I'll break it up by subsystem, but for now thought it was important
to show the full picture.

This RFC series implements support for boot-time page size selection within the
arm64 kernel. arm64 supports 3 base page sizes (4K, 16K, 64K), but to date, page
size has been selected at compile-time, meaning the size is baked into a given
kernel image. As use of larger-than-4K page sizes become more prevalent this
starts to present a problem for distributions. Boot-time page size selection
enables the creation of a single kernel image, which can be told which page size
to use on the kernel command line.

Why is having an image-per-page size problematic?
=================================================

Many traditional distros are now supporting both 4K and 64K. And this means
managing 2 kernel packages, along with drivers for each. For some, it means
multiple installer flavours and multiple ISOs. All of this adds up to a
less-than-ideal level of complexity. Additionally, Android now supports 4K and
16K kernels. I'm told having to explicitly manage their KABI for each kernel is
painful, and the extra flash space required for both kernel images and the
duplicated modules has been problematic. Boot-time page size selection solves
all of this.

Additionally, in starting to think about the longer term deployment story for
D128 page tables, which Arm architecture now supports, a lot of the same
problems need to be solved, so this work sets us up nicely for that.

So what's the down side?
========================

Well nothing's free; Various static allocations in the kernel image must be
sized for the worst case (largest supported page size), so image size is in line
with size of 64K compile-time image. So if you're interested in 4K or 16K, there
is a slight increase to the image size. But I expect that problem goes away if
you're compressing the image - its just some extra zeros. At boot-time, I expect
we could free the unused static storage once we know the page size - although
that would be a follow up enhancement.

And then there is performance. Since PAGE_SIZE and friends are no longer
compile-time constants, we must look up their values and do arithmetic at
runtime instead of compile-time. My early perf testing suggests this is
inperceptible for real-world workloads, and only has small impact on
microbenchmarks - more on this below.

Approach
========

The basic idea is to rid the source of any assumptions that PAGE_SIZE and
friends are compile-time constant, but in a way that allows the compiler to
perform the same optimizations as was previously being done if they do turn out
to be compile-time constant. Where constants are required, we use limits;
PAGE_SIZE_MIN and PAGE_SIZE_MAX. See commit log in patch 1 for full description
of all the classes of problems to solve.

By default PAGE_SIZE_MIN=PAGE_SIZE_MAX=PAGE_SIZE. But an arch may opt-in to
boot-time page size selection by defining PAGE_SIZE_MIN & PAGE_SIZE_MAX. arm64
does this if the user selects the CONFIG_ARM64_BOOT_TIME_PAGE_SIZE Kconfig,
which is an alternative to selecting a compile-time page size.

When boot-time page size is active, the arch pgtable geometry macro definitions
resolve to something that can be configured at boot. The arm64 implementation in
this series mainly uses global, __ro_after_init variables. I've tried using
alternatives patching, but that performs worse than loading from memory; I think
due to code size bloat.

FWIW, this paragraph was not entirely clear to me until I looked at patch 57 to see that the compile time page size selection had been retained, and could continue to be used as-is. It was somewhat implicit, but not IMHO explicit enough, not a big deal though.

Great work, thanks for doing that! This makes me wonder if we could leverage any of that to have a single kernel supporting both LPAE and !LPAE on ARM 32-bit, but that still seems like somewhat more difficult, largely due to the difference in the page table descriptor format (long vs. short).
--
Florian