[PATCH v3 00/19] x86: Confine early 1:1 mapped startup code
From: Ard Biesheuvel
Date: Mon Jan 29 2024 - 13:05:38 EST
From: Ard Biesheuvel <ardb@xxxxxxxxxx>
This is a follow-up to my RFC [0] that proposed to build the entire core
kernel with -fPIC, to reduce the likelihood that code that runs
extremely early from the 1:1 mapping of memory will misbehave.
This is needed to address reports that SEV boot on Clang built kernels
is broken, due to the fact that this early code attempts to access
virtual kernel address that are not mapped yet. Kevin has suggested some
workarounds to this [1] but this is really something that requires a
more rigorous approach, rather than addressing a couple of symptoms of
the underlying defect.
As it turns out, the use of fPIE for the entire kernel is neither
necessary nor sufficient, and has its own set of problems, including the
fact that the PIE small C code model uses FS rather than GS for the
per-CPU register, and only recent GCC and Clang versions permit this to
be overridden on the command line.
But the real problem is that even position independent code is not
guaranteed to execute correctly at any offset unless all statically
initialized pointer variables use the same translation as the code.
So instead, this v2 and later proposes another solution, taking the
following approach:
- clean up and refactor the startup code so that the primary startup
code executes from the 1:1 mapping but nothing else;
- define a new text section type .pi.text and enforce that it can only
call into other .pi.text sections;
- (tbd) require that objects containing .pi.text sections are built with
-fPIC, and disallow any absolute references from such objects.
The latter point is not implemented yet in this v3, but this could be
done rather straight-forwardly. (The EFI stub already does something
similar across all architectures)
Changes since v2: [2]
- move command line parsing out of early startup code entirely
- fix LTO and instrumentation related build warnings reported by Nathan
- omit PTI related PGD/P4D setters when creating the early page tables,
instead of pulling that code into the 'early' set
[0] https://lkml.kernel.org/r/20240122090851.851120-7-ardb%2Bgit%40google.com
[1] https://lore.kernel.org/all/20240111223650.3502633-1-kevinloughlin@xxxxxxxxxx/T/#u
[2] https://lkml.kernel.org/r/20240125112818.2016733-19-ardb%2Bgit%40google.com
Cc: Kevin Loughlin <kevinloughlin@xxxxxxxxxx>
Cc: Tom Lendacky <thomas.lendacky@xxxxxxx>
Cc: Dionna Glaze <dionnaglaze@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: Nathan Chancellor <nathan@xxxxxxxxxx>
Cc: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>
Cc: Justin Stitt <justinstitt@xxxxxxxxxx>
Cc: Kees Cook <keescook@xxxxxxxxxxxx>
Cc: Brian Gerst <brgerst@xxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx
Cc: linux-arch@xxxxxxxxxxxxxxx
Cc: llvm@xxxxxxxxxxxxxxx
Ard Biesheuvel (19):
efi/libstub: Add generic support for parsing mem_encrypt=
x86/boot: Move mem_encrypt= parsing to the decompressor
x86/startup_64: Drop long return to initial_code pointer
x86/startup_64: Simplify calculation of initial page table address
x86/startup_64: Simplify CR4 handling in startup code
x86/startup_64: Drop global variables keeping track of LA57 state
x86/startup_64: Simplify virtual switch on primary boot
x86/head64: Replace pointer fixups with PIE codegen
x86/head64: Simplify GDT/IDT initialization code
asm-generic: Add special .pi.text section for position independent
code
x86: Move return_thunk to __pitext section
x86/head64: Move early startup code into __pitext
modpost: Warn about calls from __pitext into other text sections
x86/coco: Make cc_set_mask() static inline
x86/sev: Make all code reachable from 1:1 mapping __pitext
x86/sev: Avoid WARN() in early code
x86/sev: Use PIC codegen for early SEV startup code
x86/sev: Drop inline asm LEA instructions for RIP-relative references
x86/startup_64: Don't bother setting up GS before the kernel is mapped
arch/x86/Makefile | 8 +
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/misc.c | 22 +++
arch/x86/boot/compressed/pgtable_64.c | 2 -
arch/x86/boot/compressed/sev.c | 6 +
arch/x86/coco/core.c | 7 +-
arch/x86/include/asm/coco.h | 8 +-
arch/x86/include/asm/desc.h | 3 +-
arch/x86/include/asm/init.h | 2 -
arch/x86/include/asm/mem_encrypt.h | 8 +-
arch/x86/include/asm/pgtable_64.h | 12 +-
arch/x86/include/asm/pgtable_64_types.h | 15 +-
arch/x86/include/asm/setup.h | 4 +-
arch/x86/include/asm/sev.h | 6 +-
arch/x86/include/uapi/asm/bootparam.h | 2 +
arch/x86/kernel/Makefile | 7 +
arch/x86/kernel/cpu/common.c | 2 -
arch/x86/kernel/head64.c | 206 +++++++-------------
arch/x86/kernel/head_64.S | 156 +++++----------
arch/x86/kernel/sev-shared.c | 54 +++--
arch/x86/kernel/sev.c | 27 ++-
arch/x86/kernel/vmlinux.lds.S | 3 +-
arch/x86/lib/Makefile | 13 --
arch/x86/lib/memcpy_64.S | 3 +-
arch/x86/lib/memset_64.S | 3 +-
arch/x86/lib/retpoline.S | 2 +-
arch/x86/mm/Makefile | 2 +-
arch/x86/mm/kasan_init_64.c | 3 -
arch/x86/mm/mem_encrypt_boot.S | 3 +-
arch/x86/mm/mem_encrypt_identity.c | 98 +++-------
drivers/firmware/efi/libstub/efi-stub-helper.c | 8 +
drivers/firmware/efi/libstub/efistub.h | 2 +-
drivers/firmware/efi/libstub/x86-stub.c | 6 +
include/asm-generic/vmlinux.lds.h | 3 +
include/linux/init.h | 12 ++
scripts/mod/modpost.c | 11 +-
tools/objtool/check.c | 26 +--
37 files changed, 319 insertions(+), 438 deletions(-)
base-commit: aa8eff72842021f52600392b245fb82d113afa8a
--
2.43.0.429.g432eaa2c6b-goog