Re: [PATCH v15 22/23] x86/mce: Improve error log of kernel space TDX #MC due to erratum

From: Dave Hansen
Date: Fri Dec 01 2023 - 15:36:02 EST


On 11/9/23 03:55, Kai Huang wrote:
> +static bool is_pamt_page(unsigned long phys)
> +{
> + struct tdmr_info_list *tdmr_list = &tdx_tdmr_list;
> + int i;
> +
> + /*
> + * This function is called from #MC handler, and theoretically
> + * it could run in parallel with the TDX module initialization
> + * on other logical cpus. But it's not OK to hold mutex here
> + * so just blindly check module status to make sure PAMTs/TDMRs
> + * are stable to access.
> + *
> + * This may return inaccurate result in rare cases, e.g., when
> + * #MC happens on a PAMT page during module initialization, but
> + * this is fine as #MC handler doesn't need a 100% accurate
> + * result.
> + */

It doesn't need perfect accuracy. But how do we know it's not going to
go, for instance, chase a bad pointer?

> + if (tdx_module_status != TDX_MODULE_INITIALIZED)
> + return false;

As an example, what prevents this CPU from observing
tdx_module_status==TDX_MODULE_INITIALIZED while the PAMT structure is
being assembled?

> + for (i = 0; i < tdmr_list->nr_consumed_tdmrs; i++) {
> + unsigned long base, size;
> +
> + tdmr_get_pamt(tdmr_entry(tdmr_list, i), &base, &size);
> +
> + if (phys >= base && phys < (base + size))
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/*
> + * Return whether the memory page at the given physical address is TDX
> + * private memory or not. Called from #MC handler do_machine_check().
> + *
> + * Note this function may not return an accurate result in rare cases.
> + * This is fine as the #MC handler doesn't need a 100% accurate result,
> + * because it cannot distinguish #MC between software bug and real
> + * hardware error anyway.
> + */
> +bool tdx_is_private_mem(unsigned long phys)
> +{
> + struct tdx_module_args args = {
> + .rcx = phys & PAGE_MASK,
> + };
> + u64 sret;
> +
> + if (!platform_tdx_enabled())
> + return false;
> +
> + /* Get page type from the TDX module */
> + sret = __seamcall_ret(TDH_PHYMEM_PAGE_RDMD, &args);
> + /*
> + * Handle the case that CPU isn't in VMX operation.
> + *
> + * KVM guarantees no VM is running (thus no TDX guest)
> + * when there's any online CPU isn't in VMX operation.
> + * This means there will be no TDX guest private memory
> + * and Secure-EPT pages. However the TDX module may have
> + * been initialized and the memory page could be PAMT.
> + */
> + if (sret == TDX_SEAMCALL_UD)
> + return is_pamt_page(phys);

Either this is comment is wonky or the module initialization is buggy.

config_global_keyid() goes and does SEAMCALLs on all CPUs. There are
zero checks or special handling in there for whether the CPU has done
VMXON. So, by the time we've started initializing the TDX module
(including the PAMT), all online CPUs must be able to do SEAMCALLs. Right?

So how can we have a working PAMT here when this CPU can't do SEAMCALLs?

I don't think we should even bother with this complexity. I think we
can just fix the whole thing by saying that unless you can make a
non-init SEAMCALL, we just assume the memory can't be private.

The transition to being able to make non-init SEAMCALLs is also #MC safe
*and* it's at a point when the tdmr_list is stable.

Can anyone shoot any holes in that? :)