Re: [tip:x86/alternatives] [x86/alternatives] ee8962082a: WARNING:at_arch/x86/kernel/cpu/cpuid-deps.c:#do_clear_cpu_cap

From: Borislav Petkov
Date: Tue May 07 2024 - 07:48:59 EST


From: Borislav Petkov <bp@xxxxxxxxx>
To: Oliver Sang <oliver.sang@xxxxxxxxx>
Cc: Sean Christopherson <seanjc@xxxxxxxxxx>, oe-lkp@xxxxxxxxxxxxxxx,
lkp@xxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx,
Ingo Molnar <mingo@xxxxxxxxxx>, Srikanth Aithal <sraithal@xxxxxxx>
Bcc: bp@xxxxxxxxx
Subject: Re: [tip:x86/alternatives] [x86/alternatives] ee8962082a:
WARNING:at_arch/x86/kernel/cpu/cpuid-deps.c:#do_clear_cpu_cap
Reply-To:
In-Reply-To: <ZjnTW4XQwVHEiSaW@xsang-OptiPlex-9020>

On Tue, May 07, 2024 at 03:08:11PM +0800, Oliver Sang wrote:
> I applied the debug pach ontop of lastest Linus master:
>
> 1621a826233a7 debug patch from Boris for ee8962082a
> dccb07f2914cd (HEAD, linus/master) Merge tag 'for-6.9-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
>
> attached dmesg and cpuinfo (a little diff, so I attached it again)

Thanks, now what are we seeing here:

[ 0.763720][ T0] x86/cpu: init_ia32_feat_ctl: CPU0: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU1: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU2: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU3: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU4: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU5: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU6: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU7: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU8: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU9: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU10: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU11: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU12: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU13: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU14: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU15: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU16: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU17: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU18: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU19: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU20: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU21: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU22: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU23: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU24: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU25: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU26: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU27: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU28: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU29: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU30: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU31: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU32: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU33: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU34: FEAT_CTL: 0x5, tboot: 0
[ 0.055231][ T0] x86/cpu: init_ia32_feat_ctl: CPU35: FEAT_CTL: 0x5, tboot: 0

So following the code in init_ia32_feat_ctl(), the BSP'll get to

if (msr & FEAT_CTL_LOCKED)
goto update_caps;

and that is the case - FEAT_CTL_LOCKED, bit 0, is set.

It'll go to the update_caps label and there it'll do:

if (!cpu_has(c, X86_FEATURE_VMX))
goto update_sgx;

VMX is set if I judge by the attached cpuinfo-2 so the next check comes:

if ( (tboot && !(msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX)) ||
(!tboot && !(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX))) {
if (IS_ENABLED(CONFIG_KVM_INTEL))
pr_err_once("VMX (%s TXT) disabled by BIOS\n",
tboot ? "inside" : "outside");
clear_cpu_cap(c, X86_FEATURE_VMX);

tboot is 0, so the second conditional:

(!tboot && !(msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX))

FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX, bit 2 is set. So that conditional is
not true either. And the pr_err_once() doesn't appear in dmesg.

BUT(!), look what the original dmesg said:

[ 0.055225][ T0] x86/cpu: VMX (outside TXT) disabled by BIOS

So that FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX bit was not set back then. Why?

Oliver, have you done any BIOS config changes in the meantime?

This all looks really weird.

The other possibility would be if something changed between -rc3
(the branch x86/alternatives is based on) and -rc7. Unlikely but by now
everything's possible.

What could also be the case is, the BSP's FEAT_CTL is 0x0 (unconfigured,
whatever), we'd go in, set FEAT_CTL_LOCKED and that'll lock the bit in
all FEAT_CTLs on all cores, then it'll set
FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX. I'm presuming microcode forces this
and am obviously grasping at straws...

Then CPU1 will come, FEAT_CTL_LOCKED will be set but
FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX won't be set, leading to the warn.

But then again, SDM says that that MSR's scope is "Thread" which means,
locking that MSR on the BSP won't have effect on the same MSR on the
other HT thread.

Weird.

Ok, here's a bit modified debug patch ontop of the alternatives branch:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=feat_ctl

please run it and send me dmesg again.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette