Re: [PATCH] arm64: jump_label: use constraint "S" instead of "i"

From: Dave Martin
Date: Thu Feb 01 2024 - 06:51:00 EST


On Thu, Feb 01, 2024 at 01:11:20AM -0800, Fangrui Song wrote:
> On 2024-02-01, Ard Biesheuvel wrote:
> > On Thu, 1 Feb 2024 at 05:55, Fangrui Song <maskray@xxxxxxxxxx> wrote:
> > >
> > > On 2024-01-31, Dave Martin wrote:
> > > >On Wed, Jan 31, 2024 at 08:16:04AM +0100, Ard Biesheuvel wrote:
> > > >> Hello Fangrui,
> > > >>
> > > >> On Wed, 31 Jan 2024 at 07:53, Fangrui Song <maskray@xxxxxxxxxx> wrote:
> > > >> >
> > > >> > The constraint "i" seems to be copied from x86 (and with a redundant
> > > >> > modifier "c"). It works with -fno-PIE but not with -fPIE/-fPIC in GCC's
> > > >> > aarch64 port.
> > > >
> > > >(I'm not sure of the exact history, but the "c" may be inherited from
> > > >arm, where an output modifier was needed to suppress the "#" that
> > > >prefixes immediates in the traditional asm syntax. This does not
> > > >actually seem to be required for AArch64: rather while a # is allowed
> > > >and still considered good style in handwritten asm code, the syntax
> > > >doesn't require it, and the compiler doesn't emit it for "i" arguments,
> > > >AFAICT.)
> > >
> > > The aarch64 one could be inherited from
> > > arch/arm/include/asm/jump_label.h (2012), which could in turn be
> > > inherited from x86 (2010).
> > > Both the constraint "i" and the modifier "c" are generic..
> > > For -fno-pic this combination can be used for every arch.

Fair enough. It wasn't clear to me that "c" was generic; the output
modifiers have always been a bit of a black art...

> > >
> > > >> > The constraint "S", which denotes a symbol reference (e.g. function,
> > > >> > global variable) or label reference, is more appropriate, and has been
> > > >> > available in GCC since 2012 and in Clang since 7.0.
> > > >> >
> > > >> > Signed-off-by: Fangrui Song <maskray@xxxxxxxxxx>
> > > >> > Link: https://maskray.me/blog/2024-01-30-raw-symbol-names-in-inline-assembly
> > > >> > ---
> > > >> > arch/arm64/include/asm/jump_label.h | 8 ++++----
> > > >> > 1 file changed, 4 insertions(+), 4 deletions(-)
> > > >> >
> > > >> > diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
> > > >> > index 48ddc0f45d22..31862b3bb33d 100644
> > > >> > --- a/arch/arm64/include/asm/jump_label.h
> > > >> > +++ b/arch/arm64/include/asm/jump_label.h
> > > >> > @@ -23,9 +23,9 @@ static __always_inline bool arch_static_branch(struct static_key * const key,
> > > >> > " .pushsection __jump_table, \"aw\" \n\t"
> > > >> > " .align 3 \n\t"
> > > >> > " .long 1b - ., %l[l_yes] - . \n\t"
> > > >> > - " .quad %c0 - . \n\t"
> > > >> > + " .quad %0 - . \n\t"
> > > >> > " .popsection \n\t"
> > > >> > - : : "i"(&((char *)key)[branch]) : : l_yes);
> > > >> > + : : "S"(&((char *)key)[branch]) : : l_yes);
> > > >>
> > > >> 'key' is not used as a raw symbol name. We should make this
> > > >>
> > > >> " .quad %0 + %1 - ."
> > > >>
> > > >> and
> > > >>
> > > >> :: "S"(key), "i"(branch) :: l_yes);
> > > >>
> > > >> if we want to really clean this up.
> > > >
> > > >This hides more logic in the asm so it's arguably more cryptic
> > > >(although the code is fairly cryptic to begin with -- I don't really
> > > >see why the argument wasn't written as the equivalent
> > > >(char *)key + branch...)
> > >
> > > I agree that using "S" and "i" would introduce complexity.
> > > Using just "S" as this patch does should be clear.
> > >
> > > All of "i" "s" "S" support a symbol or label reference and a constant offset (can be zero),
> > > (in object file, a symbol and an addend; in GCC's term, the sum of a SYMBOL_REF and a CONST_INT).
> > >
> >
> > Taken the address of a struct, cast it to char[] and then index it
> > using a boolean is rather disgusting, no?
>
> I agree with you.
>
> Hmm. Clang's constraint "S" implementation doesn't support a constant
> offset, so
> `static_key_false(&nf_hooks_needed[pf][hook])` in include/linux/netfilter.h:nf_hook
> would not compile with Clang <= 18.
>
> I have a patch https://github.com/llvm/llvm-project/pull/80255 , but
> even if it is accepted and cherry-picked into the 18.x release branch,
> if we still support older Clang, we cannot use "S" unconditionally.
>
>
> So we probably need the following to prepare for -fPIE support in the
> future:
>
> diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
> index 48ddc0f45d22..b8af2f8b0c99 100644
> --- a/arch/arm64/include/asm/jump_label.h
> +++ b/arch/arm64/include/asm/jump_label.h
> @@ -15,6 +15,16 @@
> #define JUMP_LABEL_NOP_SIZE AARCH64_INSN_SIZE
> +/*
> + * Prefer "S" to support PIC. However, use "i" for Clang 18 and earlier as "S"
> + * on a symbol with a constant offset is not supported.
> + */
> +#if defined(CONFIG_CC_IS_CLANG) && __clang_major__ <= 18

Is a GCC version check needed? Or is the minimum GCC version specified
for building the kernel new enough?

> +#define JUMP_LABEL_STATIC_KEY_CONSTRAINT "i"
> +#else
> +#define JUMP_LABEL_STATIC_KEY_CONSTRAINT "S"
> +#endif
> +
> static __always_inline bool arch_static_branch(struct static_key * const key,
> const bool branch)
> {
> @@ -23,9 +33,9 @@ static __always_inline bool arch_static_branch(struct static_key * const key,
> " .pushsection __jump_table, \"aw\" \n\t"
> " .align 3 \n\t"
> " .long 1b - ., %l[l_yes] - . \n\t"
> - " .quad %c0 - . \n\t"
> + " .quad %0 + %1 - . \n\t"
> " .popsection \n\t"
> - : : "i"(&((char *)key)[branch]) : : l_yes);
> + : : JUMP_LABEL_STATIC_KEY_CONSTRAINT(key), "i"(branch) : : l_yes);

While this looks reasonable, I'm still not clear now why the asm must do
the addition.

Can we roll the branch argument into the macro?

"S"(symbol + constant) seems to do the right thing for AArch64, at least
in GCC.

> return false;
> l_yes:
> @@ -40,9 +50,9 @@ static __always_inline bool arch_static_branch_jump(struct static_key * const ke
> " .pushsection __jump_table, \"aw\" \n\t"
> " .align 3 \n\t"
> " .long 1b - ., %l[l_yes] - . \n\t"
> - " .quad %c0 - . \n\t"
> + " .quad %0 + %1 - . \n\t"
> " .popsection \n\t"
> - : : "i"(&((char *)key)[branch]) : : l_yes);
> + : : JUMP_LABEL_STATIC_KEY_CONSTRAINT(key), "i"(branch) : : l_yes);
> return false;
> l_yes:
>
>
> > > >Anyway, I don't think the "i" versys "S" distinction makes any
> > > >difference without -fpic or equivalent, so it is not really relevant
> > > >for the kernel (except that "S" breaks compatibility with older
> > > >compilers...)
> > > >
> > > >
> > > >I think the main advantage of "S" is that it stops you accidentally
> > > >emitting undesirable relocations from asm code that is not written for
> > > >the -fpic case.
> > > >
> > > >But just changing "i" to "S" is not sufficient to port asms to -fpic:
> > > >the asms still need to be reviewed.
> > > >
> > > >
> > > >So unless the asm has been reviewed for position-independence, it may
> > > >anyway be better to stick with "i" so that the compiler actually chokes
> > > >if someone tries to build the code with -fpic.
> > >
> >
> > The virtual address of the kernel is randomized by KASLR, which relies
> > on PIE linking, and this puts constraints on the permitted types of
> > relocations.
> >
> > IOW, we basically already build the kernel as PIC code, but without
> > relying on -fPIC, because that triggers some behaviors that only make
> > sense for shared objects in user space.
> >
> > > >Since we are not trying to run arbitraily many running kernels in a
> > > >common address space (and not likely to do that), I'm not sure that we
> > > >would ever build the kernel with -fpic except for a few special-case
> > > >bits like the EFI stub and vDSO... unless I've missed something?
> > > >
> >
> > Yes, KASLR. The number of kernels is not the point, the point is that
> > the virtual load address of the kernel is usually decided at boot, and
> > so the code needs to be generated to accommodate that.
> >
> > > >If there's another reason why "S" is advantageous though, I'm happy to
> > > >be corrected.
> > >
> > > I remember that Ard has an RFC
> > > https://lore.kernel.org/linux-arm-kernel/20220427171241.2426592-1-ardb@xxxxxxxxxx/
> > > "[RFC PATCH 0/2] arm64: use PIE code generation for KASLR kernel"
> > > and see some recent PIE codegen patches.
> > >
> > > > Building the KASLR kernel without -fpie but linking it with -pie works
> > > > in practice, but it is not something that is explicitly supported by the
> > > > toolchains - it happens to work because the default 'small' code model
> > > > used by both GCC and Clang relies mostly on ADRP+ADD/LDR to generate
> > > > symbol references.

Does this mean that doing boot-time relocation on a fully statically
linked kernel doesn't bring much benefit? It would tend to be more
painful to do the relocation work, and mean that the kernel text has to
be temporarily writeable, but static linking should have the best
runtime performance.

Maybe it doesn't make as much difference as I thought (certainly ADRP
based addressing should make a fair amount of the PIC/PIE overhead go
away).

> > >
> > > I agree that current -fno-PIE with -shared -Bsymbolic linking is a hack
> > > that works as a conincidence, not guaranteed by the toolchain.
> > > This jump_label improvement (with no object file difference) fixes an
> > > obstacle.
> >
> > If we can get the guaranteed behavior of #pragma GCC visibility
> > push(hidden) from a command line option, we should build the core
> > kernel with -fpie instead. (Modules are partially linked objects, so
> > they can be built non-PIC as before)
> >
> I believe we don't have such a GCC option, but the effect can be
> simulated by -include hidden.h ...

[...]

Cheers
---Dave