Re: [PATCH v2 10/11] x86/alternatives: Simplify ALTERNATIVE_n()

From: Peter Zijlstra
Date: Sat Sep 09 2023 - 05:26:23 EST


On Sat, Sep 09, 2023 at 09:50:09AM +0200, Borislav Petkov wrote:
> On Thu, Sep 07, 2023 at 05:30:36PM +0200, Borislav Petkov wrote:
> > But I might be missing something so lemme poke at it more.
>
> So my guest boots with the below diff ontop of yours. That doesn't mean
> a whole lot but it looks like it DTRT. (Btw, the DPRINTK hunk is hoisted
> up only for debugging - not part of the change).
>
> And I've backed out your handling of the additional padding because, as
> we've established, that's not really additional padding but the padding
> which is missing when a subsequent sequence is longer.
>
> It all ends up being a single consecutive region of padding as it should
> be.
>
> Building that says:
>
> arch/x86/entry/entry_64.o: warning: objtool: entry_SYSCALL_64+0x91: weirdly overlapping alternative! 5 != 16
> arch/x86/entry/entry_64_compat.o: warning: objtool: entry_SYSENTER_compat+0x80: weirdly overlapping alternative! 5 != 16
>
> but that warning is bogus because the code in question is the
> UNTRAIN_RET macro which has an empty orig insn, then two CALLs of size
> 5 and then the RESET_CALL_DEPTH sequence which is 16 bytes.
>
> At build time it looks like this:
>
> ffffffff81c000d1: 90 nop
> ffffffff81c000d2: 90 nop
> ffffffff81c000d3: 90 nop
> ffffffff81c000d4: 90 nop
> ffffffff81c000d5: 90 nop
> ffffffff81c000d6: 90 nop
> ffffffff81c000d7: 90 nop
> ffffffff81c000d8: 90 nop
> ffffffff81c000d9: 90 nop
> ffffffff81c000da: 90 nop
> ffffffff81c000db: 90 nop
> ffffffff81c000dc: 90 nop
> ffffffff81c000dd: 90 nop
> ffffffff81c000de: 90 nop
> ffffffff81c000df: 90 nop
> ffffffff81c000e0: 90 nop
>
> and those are 16 contiguous NOPs of padding.
>
> At boot time, it does:
>
> [ 0.679523] SMP alternatives: feat: 11*32+15, old: (entry_SYSCALL_64_after_hwframe+0x59/0xd8 (ffffffff81c000d1) len: 5), repl: (ffffffff833a362b, len: 5)
> [ 0.683516] SMP alternatives: ffffffff81c000d1: [0:5) optimized NOPs: 0f 1f 44 00 00
>
> That first one is X86_FEATURE_UNRET and the alt_instr descriptor simply
> says that the replacement is 5 bytes long, which is the CALL that can
> potentially be poked in. It doesn't care about the following 11 bytes of
> padding because it doesn't matter - it wants 5 bytes only for the CALL.
>
> [ 0.687514] SMP alternatives: feat: 11*32+10, old: (entry_SYSCALL_64_after_hwframe+0x59/0xd8 (ffffffff81c000d1) len: 5), repl: (ffffffff833a3630, len: 5)
> [ 0.691521] SMP alternatives: ffffffff81c000d1: [0:5) optimized NOPs: 0f 1f 44 00 00
>
> This is X86_FEATURE_ENTRY_IBPB. Same thing.
>
> [ 0.695515] SMP alternatives: feat: 11*32+19, old: (entry_SYSCALL_64_after_hwframe+0x59/0xd8 (ffffffff81c000d1) len: 16), repl: (ffffffff833a3635, len: 16)
> [ 0.699516] SMP alternatives: ffffffff81c000d1: [0:16) optimized NOPs: eb 0e cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>
> And this is X86_FEATURE_CALL_DEPTH and here the alt_instr descriptor has
> replacement length of 16 and that is all still ok as it starts at the
> same address and contains the first 5 bytes from the previous entries
> which overlap here.
>
> So address-wise we're good, the alt_instr patching descriptors are
> correct and we should be good.
>
> Thoughts?
>
> ---
>

> @@ -415,22 +415,18 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
> for (a = start; a < end; a++) {
> int insn_buff_sz = 0;
>
> - /*
> - * In case of nested ALTERNATIVE()s the outer alternative might
> - * add more padding. To ensure consistent patching find the max
> - * padding for all alt_instr entries for this site (nested
> - * alternatives result in consecutive entries).
> - */
> - for (b = a+1; b < end && b->instr_offset == a->instr_offset; b++) {
> - u8 len = max(a->instrlen, b->instrlen);
> - a->instrlen = b->instrlen = len;
> - }
> -
> instr = (u8 *)&a->instr_offset + a->instr_offset;
> replacement = (u8 *)&a->repl_offset + a->repl_offset;
> BUG_ON(a->instrlen > sizeof(insn_buff));
> BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
>

> diff --git a/tools/objtool/arch/x86/special.c b/tools/objtool/arch/x86/special.c
> index 7145920a7aba..29e949579ede 100644
> --- a/tools/objtool/arch/x86/special.c
> +++ b/tools/objtool/arch/x86/special.c
> @@ -9,29 +9,6 @@
>
> void arch_handle_alternative(unsigned short feature, struct special_alt *alt)
> {
> - static struct special_alt *group, *prev;
> -
> - /*
> - * Recompute orig_len for nested ALTERNATIVE()s.
> - */
> - if (group && group->orig_sec == alt->orig_sec &&
> - group->orig_off == alt->orig_off) {
> -
> - struct special_alt *iter = group;
> - for (;;) {
> - unsigned int len = max(iter->orig_len, alt->orig_len);
> - iter->orig_len = alt->orig_len = len;
> -
> - if (iter == prev)
> - break;
> -
> - iter = list_next_entry(iter, list);
> - }
> -
> - } else group = alt;
> -
> - prev = alt;
> -
> switch (feature) {
> case X86_FEATURE_SMAP:
> /*

Yeah, that wasn't optional.

So what you end up with is:

661:
"one byte orig insn"
"one nop because alt1 is 2 bytes"
"one nop because alt2 is 3 bytes"

right?

But your alt_instr are:

alt_instr1 = {
.instr_offset = 661b-.; /* .text location */
.repl_offset = 664f-.; /* .altinstr_replacement location */

/* .ft_flags */

.instrlen = 2;
.replacementlen = 2;
}

alt_instr2 = {
.instr_offset = 661b-.;
.repl_offset = 664f-.;

/* .ft_flags */

.instrlen = 3;
.replacementlen = 3;
}


So if you patch alt2, you will only patch 2 bytes of the original text,
even though that has 3 bytes of 'space'.


This becomes more of a problem with your example above where the
respective lengths are 0, 5, 16. In that case, when you patch 5, you'll
leave 11 single nops in there.

So what that code you deleted does is look for all alternatives that
start at the same point and computes the max replacementlen, because
that is the amount of bytes in the source text that has been reserved
for this alternative.

That is not optional.