Re: [PATCH v5 07/22] x86/virt/tdx: Implement SEAMCALL function

From: Kai Huang
Date: Wed Jul 20 2022 - 06:18:23 EST


On Tue, 2022-06-28 at 10:10 +1200, Kai Huang wrote:
> On Mon, 2022-06-27 at 13:58 -0700, Dave Hansen wrote:
> > On 6/26/22 22:23, Kai Huang wrote:
> > > On Fri, 2022-06-24 at 11:38 -0700, Dave Hansen wrote:
> > > > On 6/22/22 04:16, Kai Huang wrote:
> > > > > SEAMCALL instruction causes #GP when SEAMRR isn't enabled, and #UD when
> > > > > CPU is not in VMX operation. The TDX_MODULE_CALL macro doesn't handle
> > > > > SEAMCALL exceptions. Leave to the caller to guarantee those conditions
> > > > > before calling __seamcall().
> > > >
> > > > I was trying to make the argument earlier that you don't need *ANY*
> > > > detection for TDX, other than the ability to make a SEAMCALL.
> > > > Basically, patch 01/22 could go away.
> > ...
> > > > So what does patch 01/22 buy us? One EXTABLE entry?
> > >
> > > There are below pros if we can detect whether TDX is enabled by BIOS during boot
> > > before initializing the TDX Module:
> > >
> > > 1) There are requirements from customers to report whether platform supports TDX
> > > and the TDX keyID numbers before initializing the TDX module so the userspace
> > > cloud software can use this information to do something. Sorry I cannot find
> > > the lore link now.
> >
> > <sigh>
> >
> > Never listen to customers literally. It'll just lead you down the wrong
> > path. They told you, "we need $FOO in dmesg" and you ran with it
> > without understanding why. The fact that you even *need* to find the
> > lore link is because you didn't bother to realize what they really needed.
> >
> > dmesg is not ABI. It's for humans. If you need data out of the kernel,
> > do it with a *REAL* ABI. Not dmesg.
>
> Showing in the dmesg is the first step, but later we have plan to expose keyID
> info via /sysfs. Of course, it's always arguable customer's such requirement is
> absolutely needed, but to me it's still a good thing to have code to detect TDX
> during boot. The code isn't complicated as you can see.
>
> >
> > > 2) As you can see, it can be used to handle ACPI CPU/memory hotplug and driver
> > > managed memory hotplug. Kexec() support patch also can use it.
> > >
> > > Particularly, in concept, ACPI CPU/memory hotplug is only related to whether TDX
> > > is enabled by BIOS, but not whether TDX module is loaded, or the result of
> > > initializing the TDX module. So I think we should have some code to detect TDX
> > > during boot.
> >
> > This is *EXACTLY* why our colleagues at Intel needs to tell us about
> > what the OS and firmware should do when TDX is in varying states of decay.
>
> Yes I am working on it to make it public.
>
> >
> > Does the mere presence of the TDX module prevent hotplug?  
> >
>
> For ACPI CPU hotplug, yes. The TDX module even doesn't need to be loaded.
> Whether SEAMRR is enabled determines.
>
> For ACPI memory hotplug, in practice yes. For architectural behaviour, I'll
> work with others internally to get some public statement.
>
> > Or, if a
> > system has the TDX module loaded but no intent to ever use TDX, why
> > can't it just use hotplug like a normal system which is not addled with
> > the TDX albatross around its neck?
>
> I think if a machine has enabled TDX in the BIOS, the user of the machine very
> likely has intention to actually use TDX.
>
> Yes for driver-managed memory hotplug, it makes sense if user doesn't want to
> use TDX, it's better to not disable it. But to me it's also not a disaster if
> we just disable driver-managed memory hotplug if TDX is enabled by BIOS.
>
> For ACPI memory hotplug, I think in practice we can treat it as BIOS bug, but
> I'll get some public statement around this.
>

Hi Dave,

Try to close on how to handle memory hotplug. After discussion, below will be
architectural behaviour of TDX in terms of ACPI memory hotplug:

1) During platform boot, CMRs must be physically present. MCHECK verifies all
CMRs are physically present and are actually TDX convertible memory.
2) CMRs are static after platform boots and don't change at runtime.  TDX
architecture doesn't support hot-add or hot-removal of CMR memory.
3) TDX architecture doesn't forbid non-CMR memory hotplug.

Also, although TDX doesn't trust BIOS in terms of security, a non-buggy BIOS
should prevent CMR memory from being hot-removed. If kernel ever receives such
event, it's a BIOS bug, or even worse, the BIOS is compromised and under attack.

As a result, the kernel should also never receive event of hot-add CMR memory.
It is very much likely TDX is under attack (physical attack) in such case, i.e.
someone is trying to physically replace any CMR memory.

In terms of how to handle ACPI memory hotplug, my thinking is -- ideally, if the
kernel can get the CMRs during kernel boot when detecting whether TDX is enabled
by BIOS, we can do below:

- For memory hot-removal, if the removed memory falls into any CMR, then kernel
can speak loudly it is a BIOS bug. But when this happens, the hot-removal has
been handled by BIOS thus kernel cannot actually prevent, so kernel can either
BUG(), or just print error message. If the removed memory doesn't fall into
CMR, we do nothing.

- For memory hot-add, if the new memory falls into any CMR, then kernel should
speak loudly it is a BIOS bug, or even say "TDX is under attack" as this is only
possible when CMR memory has been previously hot-removed. And kernel should
reject the new memory for security reason. If the new memory doesn't fall into
any CMR, then we (also) just reject the new memory, as we want to guarantee all
memory in page allocator are TDX pages. But this is basically due to kernel
policy but not due to TDX architecture.

BUT, since as the first step, we cannot get the CMR during kernel boot (as it
requires additional code to put CPU into VMX operation), I think for now we can
handle ACPI memory hotplug in below way:

- For memory hot-removal, we do nothing.
- For memory hot-add, we simply reject the new memory when TDX is enabled by
BIOS. This not only prevents the potential "physical attack of replacing any
CMR memory", but also makes sure no non-CMR memory will be added to page
allocator during runtime via ACPI memory hot-add.

We can improve this in next stage when we can get CMRs during kernel boot.

For the concern that on a TDX BIOS enabled system, people may not want to use
TDX at all but just use it as normal system, as I replied to Dan regarding to
the driver-managed memory hotplug, we can provide a kernel commandline, i.e.
use_tdx={on|off}, to allow user to *choose* between TDX and memory hotplug.
When use_tdx=off, we continue to allow memory hotplug and driver-managed hotplug
as normal but refuse to initialize TDX module.

Any comments?