Re: [PATCH 4/7] arm64: add 'runtime constant' support

From: Mark Rutland
Date: Tue Jun 11 2024 - 13:49:14 EST


On Tue, Jun 11, 2024 at 09:56:17AM -0700, Linus Torvalds wrote:
> On Tue, 11 Jun 2024 at 07:29, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> >
> > Do we expect to use this more widely? If this only really matters for
> > d_hash() it might be better to handle this via the alternatives
> > framework with callbacks and avoid the need for new infrastructure.
>
> Hmm. The notion of a callback for alternatives is intriguing and would
> be very generic, but we don't have anything like that right now.
>
> Is anybody willing to implement something like that? Because while I
> like the idea, it sounds like a much bigger change.

Fair enough if that's a pain on x86, but we already have them on arm64, and
hence using them is a smaller change there. We already have a couple of cases
which uses MOVZ;MOVK;MOVK;MOVK sequence, e.g.

// in __invalidate_icache_max_range()
asm volatile(ALTERNATIVE_CB("movz %0, #0\n"
"movk %0, #0, lsl #16\n"
"movk %0, #0, lsl #32\n"
"movk %0, #0, lsl #48\n",
ARM64_ALWAYS_SYSTEM,
kvm_compute_final_ctr_el0)
: "=r" (ctr));

... which is patched via the callback:

void kvm_compute_final_ctr_el0(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst)
{
generate_mov_q(read_sanitised_ftr_reg(SYS_CTR_EL0),
origptr, updptr, nr_inst);
}

... where the generate_mov_q() helper does the actual instruction generation.

So if we only care about a few specific constants, we could give them their own
callbacks, like kvm_compute_final_ctr_el0() above.

[...]

> > We have some helpers for instruction manipulation, and we can use
> > aarch64_insn_encode_immediate() here, e.g.
> >
> > #include <asm/insn.h>
> >
> > static inline void __runtime_fixup_16(__le32 *p, unsigned int val)
> > {
> > u32 insn = le32_to_cpu(*p);
> > insn = aarch64_insn_encode_immediate(AARCH64_INSN_IMM_16, insn, val);
> > *p = cpu_to_le32(insn);
> > }
>
> Ugh. I did that, and then noticed that it makes the generated code
> about ten times bigger.
>
> That interface looks positively broken.
>
> There is absolutely nobody who actually wants a dynamic argument, so
> it would have made both the callers and the implementation *much*
> simpler had the "AARCH64_INSN_IMM_16" been encoded in the function
> name the way I did it for my instruction rewriting.
>
> It would have made the use of it simpler, it would have avoided all
> the "switch (type)" garbage, and it would have made it all generate
> much better code.

Oh, completely agreed. FWIW, I have better versions sat in my
arm64/insn/rework branch, but I haven't had the time to get all the rest
of the insn framework cleanup sorted:

https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=arm64/insn/rework&id=9cf0ec088c9d5324c60933bf3924176fea0a4d0b

I can go prioritise getting that bit out if it'd help, or I can clean
this up later.

Those allow the compiler to do much better, including compile-time (or
runtime) checks that immediates fit. For example:

void encode_imm16(__le32 *p, u16 imm)
{
u32 insn = le32_to_cpu(*p);

// Would warn if 'imm' were u32.
// As u16 always fits, no warning
BUILD_BUG_ON(!aarch64_insn_try_encode_unsigned_imm16(&insn, imm));

*p = cpu_to_le32(insn);
}

... compiles to:

<encode_imm16>:
ldr w2, [x0]
bfi w2, w1, #5, #16
str w2, [x0]
ret

... which I think is what you want?

> So I did that change you suggested, and then undid it again.
>
> Because that whole aarch64_insn_encode_immediate() thing is an
> abomination, and should be burned at the stake. It's misdesigned in
> the *worst* possible way.
>
> And no, this code isn't performance-critical, but I have some taste,
> and the code I write will not be using that garbage.

Fair enough.

Mark.