Re: [GIT PULL] x86/boot enhancements for v6.14

From: Ard Biesheuvel
Date: Mon Jan 27 2025 - 06:12:23 EST


(cc Nathan)

On Mon, 27 Jan 2025 at 04:19, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, 21 Jan 2025 at 13:29, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >
> > - A series to remove the last remaining absolute symbol references from
> > .head.text, and enforce this at build time, by Ard Biesheuvel:
> > [...]
> > - Which build-time enforcement uncovered a handful of bugs of essentially
> > non-working code, and a wrokaround for a toolchain bug, fixed by
> > Ard Biesheuvel as well:
> >
> > - Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
> > - Disable UBSAN on SEV code that may execute very early
> > - Disable ftrace branch profiling in SEV startup code
>
> Bah. I only noticed this today, because I was on the road part of the
> week and didn't do my usual "build with clang".
>
> But this is broken with my normal clang config, and I get a very
> unhelpful error message:
>
> Absolute reference to symbol '.rodata' not permitted in .head.text
>

Agreed. I'll change this to section+addend if the symbol is
STT_SECTION, and provide the offset into .head.text as well.

...

> Anyway, not know what the right thing to do is, I hacked up the
> makefiles to squirrel off a copy of the vmlinux file, and did
>
> objdump --no-addresses --no-show-raw-insn \
> -j .head.text --disassemble \
> -rR ORIGINAL | less -S
>
> and sure enough, it shows things like this:
>
> <snp_cpuid>:
> push %rbp
> push %r15
> push %r14
> push %rbx
> sub $0x18,%rsp
> lea 0x5ac8fb(%rip),%r8 # <cpuid_table_copy>
> R_X86_64_PC32 .rodata+0x4f1758
> mov (%r8),%eax
> test %eax,%eax
> ..
> lea 0x5ac841(%rip),%rax # <cpuid_std_range_max>
> R_X86_64_PC32 .rodata+0x4f174c
> ..
> jmp *-0x7dfffe90(,%r9,8)
> R_X86_64_32S .rodata+0x170

So this should be the culprit - all the other references are
RIP-relative so those are fine.

I guess you are disabling retpolines and IBT in your Clang config?
This looks like the switch() in snp_cpuid_postprocess() being emitted
as a jump table.

This is another example of a pattern that is simply broken and
guaranteed to fail when booting this kernel as a SEV-SNP guest. So I
hope we agree that these issues should be detected at build time, and
it is only the quality of the diagnostic message that you are
objecting to?

> Anyway, that check needs to either
>
> (a) die a painful death very quickly
>
> (b) be made to actually print out useful information of WHERE the
> relocation comes from and WHERE it points to
>
> because the current implementation of that check is not acceptable.
>
> The next time I have to play makefile games and then do objdump by
> hand to figure out what the %^$% the build is complaining about, I'm
> just reverting it outright and not writing this long explanation of
> the problem.
>

Understood.