Re: [PATCH v15 22/23] x86/mce: Improve error log of kernel space TDX #MC due to erratum

From: Huang, Kai
Date: Tue Dec 05 2023 - 15:08:42 EST


On Tue, 2023-12-05 at 20:56 +0100, Borislav Petkov wrote:
> On Tue, Dec 05, 2023 at 07:41:41PM +0000, Huang, Kai wrote:
> > -static const char *mce_memory_info(struct mce *m)
> > +static const char *mce_dump_aux_info(struct mce *m)
> > {
> > - if (!m || !mce_is_memory_error(m) || !mce_usable_address(m))
> > - return NULL;
> > -
> > /*
> > - * Certain initial generations of TDX-capable CPUs have an
> > - * erratum. A kernel non-temporal partial write to TDX private
> > - * memory poisons that memory, and a subsequent read of that
> > - * memory triggers #MC.
> > - *
> > - * However such #MC caused by software cannot be distinguished
> > - * from the real hardware #MC. Just print additional message
> > - * to show such #MC may be result of the CPU erratum.
> > + * Confidential computing platforms such as TDX platforms
> > + * may occur MCE due to incorrect access to confidential
> > + * memory. Print additional information for such error.
> > */
> > - if (!boot_cpu_has_bug(X86_BUG_TDX_PW_MCE))
> > + if (!m || !mce_is_memory_error(m) || !mce_usable_address(m))
> > return NULL;
>
> What's the point of doing this on !TDX? None.

OK. I'll move this inside tdx_get_mce_info().

>
> > - return !tdx_is_private_mem(m->addr) ? NULL :
> > - "TDX private memory error. Possible kernel bug.";
> > + if (platform_tdx_enabled())
>
> So is this the "host is TDX" check?
>
> Not a X86_FEATURE flag but something homegrown. And Kirill is trying to
> switch the CC_ATTRs to X86_FEATURE_ flags for SEV but here you guys are
> using something homegrown.
>
> why not a X86_FEATURE_ flag?
>

The difference is for TDX host the kernel needs to initialize the TDX module
first before TDX can be used. The module initialization is done at runtime, and
the platform_tdx_enabled() here only returns whether the BIOS has enabled TDX.

IIUC the X86_FEATURE_ flag doesn't suit this purpose because based on my
understanding the flag being present means the kernel has done some enabling
work and the feature is ready to use.