Re: [PATCH] arm64: jump_label: use constraint "S" instead of "i"

From: Ard Biesheuvel
Date: Thu Feb 01 2024 - 03:09:39 EST


On Thu, 1 Feb 2024 at 05:55, Fangrui Song <maskray@xxxxxxxxxx> wrote:
>
> On 2024-01-31, Dave Martin wrote:
> >On Wed, Jan 31, 2024 at 08:16:04AM +0100, Ard Biesheuvel wrote:
> >> Hello Fangrui,
> >>
> >> On Wed, 31 Jan 2024 at 07:53, Fangrui Song <maskray@xxxxxxxxxx> wrote:
> >> >
> >> > The constraint "i" seems to be copied from x86 (and with a redundant
> >> > modifier "c"). It works with -fno-PIE but not with -fPIE/-fPIC in GCC's
> >> > aarch64 port.
> >
> >(I'm not sure of the exact history, but the "c" may be inherited from
> >arm, where an output modifier was needed to suppress the "#" that
> >prefixes immediates in the traditional asm syntax. This does not
> >actually seem to be required for AArch64: rather while a # is allowed
> >and still considered good style in handwritten asm code, the syntax
> >doesn't require it, and the compiler doesn't emit it for "i" arguments,
> >AFAICT.)
>
> The aarch64 one could be inherited from
> arch/arm/include/asm/jump_label.h (2012), which could in turn be
> inherited from x86 (2010).
> Both the constraint "i" and the modifier "c" are generic..
> For -fno-pic this combination can be used for every arch.
>
> >> > The constraint "S", which denotes a symbol reference (e.g. function,
> >> > global variable) or label reference, is more appropriate, and has been
> >> > available in GCC since 2012 and in Clang since 7.0.
> >> >
> >> > Signed-off-by: Fangrui Song <maskray@xxxxxxxxxx>
> >> > Link: https://maskray.me/blog/2024-01-30-raw-symbol-names-in-inline-assembly
> >> > ---
> >> > arch/arm64/include/asm/jump_label.h | 8 ++++----
> >> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >> >
> >> > diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
> >> > index 48ddc0f45d22..31862b3bb33d 100644
> >> > --- a/arch/arm64/include/asm/jump_label.h
> >> > +++ b/arch/arm64/include/asm/jump_label.h
> >> > @@ -23,9 +23,9 @@ static __always_inline bool arch_static_branch(struct static_key * const key,
> >> > " .pushsection __jump_table, \"aw\" \n\t"
> >> > " .align 3 \n\t"
> >> > " .long 1b - ., %l[l_yes] - . \n\t"
> >> > - " .quad %c0 - . \n\t"
> >> > + " .quad %0 - . \n\t"
> >> > " .popsection \n\t"
> >> > - : : "i"(&((char *)key)[branch]) : : l_yes);
> >> > + : : "S"(&((char *)key)[branch]) : : l_yes);
> >>
> >> 'key' is not used as a raw symbol name. We should make this
> >>
> >> " .quad %0 + %1 - ."
> >>
> >> and
> >>
> >> :: "S"(key), "i"(branch) :: l_yes);
> >>
> >> if we want to really clean this up.
> >
> >This hides more logic in the asm so it's arguably more cryptic
> >(although the code is fairly cryptic to begin with -- I don't really
> >see why the argument wasn't written as the equivalent
> >(char *)key + branch...)
>
> I agree that using "S" and "i" would introduce complexity.
> Using just "S" as this patch does should be clear.
>
> All of "i" "s" "S" support a symbol or label reference and a constant offset (can be zero),
> (in object file, a symbol and an addend; in GCC's term, the sum of a SYMBOL_REF and a CONST_INT).
>

Taken the address of a struct, cast it to char[] and then index it
using a boolean is rather disgusting, no?

> >Anyway, I don't think the "i" versys "S" distinction makes any
> >difference without -fpic or equivalent, so it is not really relevant
> >for the kernel (except that "S" breaks compatibility with older
> >compilers...)
> >
> >
> >I think the main advantage of "S" is that it stops you accidentally
> >emitting undesirable relocations from asm code that is not written for
> >the -fpic case.
> >
> >But just changing "i" to "S" is not sufficient to port asms to -fpic:
> >the asms still need to be reviewed.
> >
> >
> >So unless the asm has been reviewed for position-independence, it may
> >anyway be better to stick with "i" so that the compiler actually chokes
> >if someone tries to build the code with -fpic.
>

The virtual address of the kernel is randomized by KASLR, which relies
on PIE linking, and this puts constraints on the permitted types of
relocations.

IOW, we basically already build the kernel as PIC code, but without
relying on -fPIC, because that triggers some behaviors that only make
sense for shared objects in user space.

> >Since we are not trying to run arbitraily many running kernels in a
> >common address space (and not likely to do that), I'm not sure that we
> >would ever build the kernel with -fpic except for a few special-case
> >bits like the EFI stub and vDSO... unless I've missed something?
> >

Yes, KASLR. The number of kernels is not the point, the point is that
the virtual load address of the kernel is usually decided at boot, and
so the code needs to be generated to accommodate that.

> >If there's another reason why "S" is advantageous though, I'm happy to
> >be corrected.
>
> I remember that Ard has an RFC
> https://lore.kernel.org/linux-arm-kernel/20220427171241.2426592-1-ardb@xxxxxxxxxx/
> "[RFC PATCH 0/2] arm64: use PIE code generation for KASLR kernel"
> and see some recent PIE codegen patches.
>
> > Building the KASLR kernel without -fpie but linking it with -pie works
> > in practice, but it is not something that is explicitly supported by the
> > toolchains - it happens to work because the default 'small' code model
> > used by both GCC and Clang relies mostly on ADRP+ADD/LDR to generate
> > symbol references.
>
> I agree that current -fno-PIE with -shared -Bsymbolic linking is a hack
> that works as a conincidence, not guaranteed by the toolchain.
> This jump_label improvement (with no object file difference) fixes an
> obstacle.

If we can get the guaranteed behavior of #pragma GCC visibility
push(hidden) from a command line option, we should build the core
kernel with -fpie instead. (Modules are partially linked objects, so
they can be built non-PIC as before)