Re: Skylake early panic after GDS mark

From: Dave Hansen
Date: Wed Oct 11 2023 - 12:46:35 EST


... adding some mailing lists

On 10/11/23 08:48, Mike Pagano wrote:
> Hello, Dave,
>
> I get a very early kernel panic with commit:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/cpu/common.c?id=c9f4c45c8ec3f07f4f083f9750032a1ec3eab6b2
>
> I reverted this and the system boots
>
> $ dmesg | grep -i microcode
> [    0.000000] microcode: updated early: 0xc2 -> 0xf2, date = 2023-01-02
> [    0.528415] microcode: Microcode Update Driver: v2.2.

You've probably got a bug in your init binary. The instruction that
puked shows up as:

vmovd xmm2,esi

... and it's in userspace. You don't have a microcode mitigation for
GDS, so you're probably set to use GDS_MITIGATION_FORCE, which means the
kernel is disabling AVX:

> /* No microcode */
> if (!(x86_read_arch_cap_msr() & ARCH_CAP_GDS_CTRL)) {
> if (gds_mitigation == GDS_MITIGATION_FORCE) {
> /*
> * This only needs to be done on the boot CPU so do it
> * here rather than in update_gds_msr()
> */
> setup_clear_cpu_cap(X86_FEATURE_AVX);
> pr_warn("Microcode update needed! Disabling AVX as mitigation.\n");
> } else {
> gds_mitigation = GDS_MITIGATION_UCODE_NEEDED;
> }
> goto out;
> }

One, you should file a bug on your init process since it's probably not
doing AVX enumeration properly. It probably needs to have a fallback
path for when the CPU doesn't support AVX.

Second, you can work around this with:

gather_data_sampling=off

That's not great because you'll still be exposed to the vulnerability.
But you can at least boot.