Re: [RFC PATCH 25/28] x86: Use PIE codegen for the core kernel

From: Ard Biesheuvel
Date: Wed Oct 02 2024 - 11:28:17 EST


Hi Peter,

Thanks for taking a look.

On Tue, 1 Oct 2024 at 23:13, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>
> On 9/25/24 08:01, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@xxxxxxxxxx>
> >
> > As an intermediate step towards enabling PIE linking for the 64-bit x86
> > kernel, enable PIE codegen for all objects that are linked into the
> > kernel proper.
> >
> > This substantially reduces the number of relocations that need to be
> > processed when booting a relocatable KASLR kernel.
> >
>
> This really seems like going completely backwards to me.
>
> You are imposing a more restrictive code model on the kernel, optimizing
> for boot time in a way that will exert a permanent cost on the running
> kernel.
>

Fair point about the boot time. This is not the only concern, though,
and arguably the least important one.

As I responded to Andi before, it is also about using a code model and
relocation model that matches the reality of how the code is executed:
- the early C code runs from the 1:1 mapping, and needs special hacks
to accommodate this
- KASLR runs the kernel from a different virtual address than the one
we told the linker about

> There is a *huge* difference between the kernel and user space here:
>
> KERNEL MEMORY IS PERMANENTLY ALLOCATED, AND IS NEVER SHARED.
>

No need to shout.

> Dirtying user pages requires them to be unshared and dirty, which is
> undesirable. Kernel pages are *always* unshared and dirty.
>

I guess you are referring to the use of a GOT? That is a valid
concern, but it does not apply here. With hidden visibility and
compiler command line options like -mdirect-access-extern, all emitted
symbol references are direct. Disallowing text relocations could be
trivially enabled with this series if desired, and actually helps
avoid the tricky bugs we keep fixing in the early startup code that
executes from the 1:1 mapping (the C code in .head.text)

So it mostly comes down to minor differences in addressing modes, e.g.,

movq $sym, %reg

actually uses more bytes than

leaq sym(%rip), %reg

whereas

movq sym, %reg

and

movq sym(%rip), %reg

are the same length.

OTOH, indexing a statically allocated global array like

movl array(,%reg1,4), %reg2

will be converted into

leaq array(%rip), %reg2
movl (%reg2,%reg1,4), %reg2

and is therefore less efficient in terms of code footprint. But in
general, the x86_64 ISA and psABI are quite flexible in this regard,
and extrapolating from past experiences with PIC code on i386 is not
really justified here.

As Andi also pointed out, what ultimately matters is the performance,
as well as code size where it impacts performance, through the I-cache
footprint. I'll do some testing before reposting, and maybe not bother
if the impact is negative.

> > It also brings us much closer to the ordinary PIE relocation model used
> > for most of user space, which is therefore much better supported and
> > less likely to create problems as we increase the range of compilers and
> > linkers that need to be supported.
>
> We have been resisting *for ages* making the kernel worse to accomodate
> broken compilers. We don't "need" to support more compilers -- we need
> the compilers to support us. We have working compilers; any new compiler
> that wants to play should be expected to work correctly.
>

We are in a much better place now than we were before in that regard,
which is actually how this effort came about: instead of lying to the
compiler, and maintaining our own pile of scripts and relocation
tools, we can just do what other arches are doing in Linux, and let
the toolchain do it for us.