Re: [PATCH v7 06/20] x86/virt/tdx: Shut down TDX module in case of error

From: Dave Hansen
Date: Tue Nov 29 2022 - 16:41:16 EST


On 11/22/22 11:33, Peter Zijlstra wrote:
> On Tue, Nov 22, 2022 at 11:24:48AM -0800, Dave Hansen wrote:
>>> Not intialize TDX on busy NOHZ_FULL cpus and hard-limit the cpumask of
>>> all TDX using tasks.
>> I don't think that works. As I mentioned to Thomas elsewhere, you don't
>> just need to initialize TDX on the CPUs where it is used. Before the
>> module will start working you need to initialize it on *all* the CPUs it
>> knows about. The module itself has a little counter where it tracks
>> this and will refuse to start being useful until it gets called
>> thoroughly enough.
> That's bloody terrible, that is. How are we going to make that work with
> the SMT mitigation crud that forces the SMT sibilng offline?
>
> Then the counters don't match and TDX won't work.
>
> Can we get this limitiation removed and simply let the module throw a
> wobbly (error) when someone tries and use TDX without that logical CPU
> having been properly initialized?

It sounds like we can at least punt the limitation away from the OS's
purview.

There's actually a multi-step process to get a "real" TDX module loaded.
There's a fancy ACM (Authenticated Code Module) that's invoked via
GETSEC[ENTERACCS] and an intermediate module loader. That dance used to
be done in the kernel, but we talked the BIOS guys into doing it instead.

I believe these per-logical-CPU checks _can_ also be punted out of the
TDX module itself and delegated to one of these earlier module loading
phases that the BIOS drives.

I'm still a _bit_ skeptical that the checks are needed in the first
place. But, as long as they're hidden from the OS, I don't see a need
to be too cranky about it.

In the end, we could just plain stop doing the TDH.SYS.LP.INIT code in
the kernel.

Unless someone screams, I'll ask the BIOS and TDX module folks to look
into this.