Re: [tip:x86/alternatives] [x86/alternatives] ee8962082a: WARNING:at_arch/x86/kernel/cpu/cpuid-deps.c:#do_clear_cpu_cap

From: Oliver Sang
Date: Wed May 08 2024 - 04:08:47 EST




hi, Boris,


very sorry that this could be a wrong report due to BIOS issues.
please just ignore it.


thanks a lot for reminding us below:

> BUT(!), look what the original dmesg said:
>
> [ 0.055225][ T0] x86/cpu: VMX (outside TXT) disabled by BIOS
>
> So that FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX bit was not set back then. Why?
>
> Oliver, have you done any BIOS config changes in the meantime?

we really don't do any manual change recently, however, per your reminding,
we rerun tests on both ee8962082a and its parent v6.9-rc3 again.

while we made original report, we always saw below for both commits.
"x86/cpu: VMX (outside TXT) disabled by BIOS"

by rerun today, we cannot see it again, on both commits.

then for ee8962082a, the reported
WARNING:at_arch/x86/kernel/cpu/cpuid-deps.c:#do_clear_cpu_cap
disappeared also while no "x86/cpu: VMX (outside TXT) disabled by BIOS"

we are using kexec to boot into kernel, and the kernel images keep same,
so we are sure this strage phenomenon is not caused by kernel build.

now we doubt it's really a bios issue, we found another similar machine,
which also show "x86/cpu: VMX (outside TXT) disabled by BIOS", but after
several rounds of cold reboot, the message disappeared, too.

we will investigate this BIOS further and avoid this kind of misleading
report in the future.

sorry again :(


On Tue, May 07, 2024 at 01:48:14PM +0200, Borislav Petkov wrote:
> From: Borislav Petkov <bp@xxxxxxxxx>
> To: Oliver Sang <oliver.sang@xxxxxxxxx>
> Cc: Sean Christopherson <seanjc@xxxxxxxxxx>, oe-lkp@xxxxxxxxxxxxxxx,
> lkp@xxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx,
> Ingo Molnar <mingo@xxxxxxxxxx>, Srikanth Aithal <sraithal@xxxxxxx>
> Bcc: bp@xxxxxxxxx
> Subject: Re: [tip:x86/alternatives] [x86/alternatives] ee8962082a:
> WARNING:at_arch/x86/kernel/cpu/cpuid-deps.c:#do_clear_cpu_cap
> Reply-To:
> In-Reply-To: <ZjnTW4XQwVHEiSaW@xsang-OptiPlex-9020>
>
> On Tue, May 07, 2024 at 03:08:11PM +0800, Oliver Sang wrote:
> > I applied the debug pach ontop of lastest Linus master:
> >
> > 1621a826233a7 debug patch from Boris for ee8962082a
> > dccb07f2914cd (HEAD, linus/master) Merge tag 'for-6.9-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
> >
> > attached dmesg and cpuinfo (a little diff, so I attached it again)
>
> Thanks, now what are we seeing here:
>
> [ 0.763720][ T0] x86/cpu: init_ia32_feat_ctl: CPU0: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU1: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU2: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU3: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU4: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU5: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU6: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU7: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU8: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU9: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU10: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU11: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU12: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU13: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU14: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU15: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU16: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU17: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU18: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU19: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU20: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU21: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU22: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU23: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU24: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU25: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU26: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU27: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU28: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU29: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU30: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU31: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU32: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU33: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU34: FEAT_CTL: 0x5, tboot: 0
> [ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU35: FEAT_CTL: 0x5, tboot: 0
>
> So following the code in init_ia32_feat_ctl(), the BSP'll get to
>
> if (msr & FEAT_CTL_LOCKED)
> goto update_caps;
>
> and that is the case - FEAT_CTL_LOCKED, bit 0, is set.
>
> It'll go to the update_caps label and there it'll do:
>
> if (!cpu_has(c, X86_FEATURE_VMX))
> goto update_sgx;
>
> VMX is set if I judge by the attached cpuinfo-2 so the next check comes:
>
> if ( (tboot && !(msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX)) ||
> (!tboot && !(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX))) {
> if (IS_ENABLED(CONFIG_KVM_INTEL))
> pr_err_once("VMX (%s TXT) disabled by BIOS\n",
> tboot ? "inside" : "outside");
> clear_cpu_cap(c, X86_FEATURE_VMX);
>
> tboot is 0, so the second conditional:
>
> (!tboot && !(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX))
>
> FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX, bit 2 is set. So that conditional is
> not true either. And the pr_err_once() doesn't appear in dmesg.
>
> BUT(!), look what the original dmesg said:
>
> [ 0.055225][ T0] x86/cpu: VMX (outside TXT) disabled by BIOS
>
> So that FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX bit was not set back then. Why?
>
> Oliver, have you done any BIOS config changes in the meantime?
>
> This all looks really weird.
>
> The other possibility would be if something changed between -rc3
> (the branch x86/alternatives is based on) and -rc7. Unlikely but by now
> everything's possible.
>
> What could also be the case is, the BSP's FEAT_CTL is 0x0 (unconfigured,
> whatever), we'd go in, set FEAT_CTL_LOCKED and that'll lock the bit in
> all FEAT_CTLs on all cores, then it'll set
> FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX. I'm presuming microcode forces this
> and am obviously grasping at straws...
>
> Then CPU1 will come, FEAT_CTL_LOCKED will be set but
> FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX won't be set, leading to the warn.
>
> But then again, SDM says that that MSR's scope is "Thread" which means,
> locking that MSR on the BSP won't have effect on the same MSR on the
> other HT thread.
>
> Weird.
>
> Ok, here's a bit modified debug patch ontop of the alternatives branch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=feat_ctl
>
> please run it and send me dmesg again.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette