[patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment()

From: Thomas Gleixner
Date: Sat Jul 16 2022 - 19:17:30 EST


load_percpu_segment() is using wrmsr() which is paravirtualized. That's an
issue because the code sequence is:

__loadsegment_simple(gs, 0);
wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));

So anything which uses a per CPU variable between setting GS to 0 and
writing GSBASE is going to end up in a NULL pointer dereference. That's
can be triggered with instrumentation and is guaranteed to be triggered
with callthunks for call depth tracking.

Use native_wrmsrl() instead. XEN_PV will trap and emulate, but that's not a
hot path.

Also make it static and mark it noinstr so neither kprobes, sanitizers or
whatever can touch it.

Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
---
arch/x86/include/asm/processor.h | 1 -
arch/x86/kernel/cpu/common.c | 12 ++++++++++--
2 files changed, 10 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -673,7 +673,6 @@ extern struct desc_ptr early_gdt_descr;
extern void switch_to_new_gdt(int);
extern void load_direct_gdt(int);
extern void load_fixmap_gdt(int);
-extern void load_percpu_segment(int);
extern void cpu_init(void);
extern void cpu_init_secondary(void);
extern void cpu_init_exception_handling(void);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -701,13 +701,21 @@ static const char *table_lookup_model(st
__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));

-void load_percpu_segment(int cpu)
+static noinstr void load_percpu_segment(int cpu)
{
#ifdef CONFIG_X86_32
loadsegment(fs, __KERNEL_PERCPU);
#else
__loadsegment_simple(gs, 0);
- wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
+ /*
+ * Because of the __loadsegment_simple(gs, 0) above, any GS-prefixed
+ * instruction will explode right about here. As such, we must not have
+ * any CALL-thunks using per-cpu data.
+ *
+ * Therefore, use native_wrmsrl() and have XenPV take the fault and
+ * emulate.
+ */
+ native_wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
#endif
}