Re: [PATCHv5 23/30] x86/boot: Avoid #VE during boot for TDX platforms

From: Kirill A. Shutemov
Date: Tue Mar 08 2022 - 11:42:45 EST


On Tue, Mar 08, 2022 at 09:19:06AM +0800, Xiaoyao Li wrote:
> On 3/8/2022 6:33 AM, Kirill A. Shutemov wrote:
> > On Mon, Mar 07, 2022 at 05:29:27PM +0800, Xiaoyao Li wrote:
> ...
> > > Even though CPUID reports MCE is supported, all the access to MCE related
> > > MSRs causes #VE. If they are accessed via mce_rdmsrl(), the #VE will be
> > > fixed up and goes to ex_handler_msr_mce(). Finally lead to panic().
> >
> > It is not panic, but warning. Like this:
> >
> > unchecked MSR access error: RDMSR from 0x179 at rIP: 0xffffffff810df1e9 (__mcheck_cpu_cap_init+0x9/0x130)
> > Call Trace:
> > <TASK>
> > mcheck_cpu_init+0x3d/0x2c0
> > identify_cpu+0x85a/0x910
> > identify_boot_cpu+0xc/0x98
> > check_bugs+0x6/0xa7
> > start_kernel+0x363/0x3d1
> > secondary_startup_64_no_verify+0xe5/0xeb
> > </TASK>
> >
> > It is annoying, but not fatal. The patchset is big enough as it is.
> > I tried to keep patch number under control.
> >
>
> I did hit panic as below.
>
> [ 0.578792] mce: MSR access error: RDMSR from 0x475 at rIP:
> 0xffffffffb94daa92 (mce_rdmsrl+0x22/0x60)
> [ 0.578792] Call Trace:
> [ 0.578792] <TASK>
> [ 0.578792] machine_check_poll+0xf0/0x260
> [ 0.578792] __mcheck_cpu_init_generic+0x3d/0xb0
> [ 0.578792] mcheck_cpu_init+0x16b/0x4a0
> [ 0.578792] identify_cpu+0x467/0x5c0
> [ 0.578792] identify_boot_cpu+0x10/0x9a
> [ 0.578792] check_bugs+0x2a/0xa06
> [ 0.578792] start_kernel+0x6bc/0x6f1
> [ 0.578792] x86_64_start_reservations+0x24/0x26
> [ 0.578792] x86_64_start_kernel+0xad/0xb2
> [ 0.578792] secondary_startup_64_no_verify+0xe4/0xeb
> [ 0.578792] </TASK>
> [ 0.578792] Kernel panic - not syncing: MCA architectural violation!
> [ 0.578792] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 5.17.0-rc5-td-guest-upstream+ #2
> [ 0.578792] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 0.0.0 02/06/2015
> [ 0.578792] Call Trace:
> [ 0.578792] <TASK>
> [ 0.578792] dump_stack_lvl+0x49/0x5f
> [ 0.578792] dump_stack+0x10/0x12
> [ 0.578792] panic+0xf9/0x2d0
> [ 0.578792] ex_handler_msr_mce+0x5e/0x5e
> [ 0.578792] fixup_exception+0x2f4/0x310
> [ 0.578792] exc_virtualization_exception+0x9b/0x100
> [ 0.578792] asm_exc_virtualization_exception+0x12/0x40
> [ 0.578792] RIP: 0010:mce_rdmsrl+0x22/0x60
> [ 0.578792] Code: a0 b9 e8 75 4d fb ff 90 55 48 89 e5 41 54 53 89 fb 48
> c7 c7 9c c1 f6 b9 e8 4b 28 00 00 65 8a 05 97 52 b4 46 84 c0 75 10 89 d9 <0f>
> 32 48 c1 e2 20 48 09 d0 5b 41 5c 5d c3 89 df e8 c9 5a 17 ff 4c
> [ 0.578792] RSP: 0000:ffffffffba203cd8 EFLAGS: 00010246
> [ 0.578792] RAX: 0000000000000000 RBX: 0000000000000475 RCX:
> 0000000000000475
> [ 0.578792] RDX: 00000000000001d0 RSI: ffffffffb9f6c19c RDI:
> ffffffffb9ece016
> [ 0.578792] RBP: ffffffffba203ce8 R08: ffffffffba203cb0 R09:
> ffffffffba203cb4
> [ 0.578792] R10: 0000000000000000 R11: 000000000000000f R12:
> 0000000000000001
> [ 0.578792] R13: ffffffffba203dc0 R14: 000000000000000a R15:
> 000000000000001d
> [ 0.578792] ? mce_rdmsrl+0x15/0x60
> [ 0.578792] machine_check_poll+0xf0/0x260
> [ 0.578792] __mcheck_cpu_init_generic+0x3d/0xb0
> [ 0.578792] mcheck_cpu_init+0x16b/0x4a0
> [ 0.578792] identify_cpu+0x467/0x5c0
> [ 0.578792] identify_boot_cpu+0x10/0x9a
> [ 0.578792] check_bugs+0x2a/0xa06
> [ 0.578792] start_kernel+0x6bc/0x6f1
> [ 0.578792] x86_64_start_reservations+0x24/0x26
> [ 0.578792] x86_64_start_kernel+0xad/0xb2
> [ 0.578792] secondary_startup_64_no_verify+0xe4/0xeb
> [ 0.578792] </TASK>
> [ 0.578792] ---[ end Kernel panic - not syncing: MCA architectural

Hm. Do you have MSR_IA32_MCG_CAP read successfully?

Otherwise you should not get inside the loop in machine_check_poll()
because mce_num_banks would be 0. In this case MSR 0x475 is never touched.

Anyway, the patchset is not intended to be complete enabling of TDX. There
are a lot of corners to be smoothed before it is production ready. Let's
keep as it is.

--
Kirill A. Shutemov