Re: [PATCH v6 00/21] TDX host kernel support

From: Huang, Kai
Date: Wed Oct 26 2022 - 19:52:06 EST


On Wed, 2022-10-26 at 16:26 -0700, Dave Hansen wrote:
> On 10/26/22 16:15, Kai Huang wrote:
> > To keep things simple, this series doesn't handle memory hotplug at all,
> > but depends on the machine owner to not do any memory hotplug operation.
> > For exmaple, the machine owner should not plug any NVDIMM and CXL memory
> > into the machine, or use kmem driver to plug NVDIMM or CXL memory to the
> > core-mm.
> >
> > This will be enhanced in the future after first submission. We are also
> > looking into options on how to handle:
>
> This is also known as the "hopes and prayers" approach to software
> enabling. "Let's just hope and pray that nobody does these things which
> we know are broken."
>
> In the spirit of moving this submission forward, I'm willing to continue
> to _review_ this series.  
>

Thank you Dave!

> But, I don't think it can go upstream until it
> contains at least _some_ way to handle memory hotplug.
>
>

Yes I agree.

One intention of sending out this series is actually to hear feedbacks on how to
handle. As mentioned in the cover letter, AFAICT we have two options:

1) to enforce the kernel to always guarantee all pages in the page allocator are
TDX memory (i.e. via rejecting non-TDX memory in memory hotplug). Non-TDX
memory can be used via devdax.
2) to manage TDX and non-TDX memory in different NUMA nodes, and use per-node
TDX memory capability flag to show which nodes are TDX-capable. Userspace needs
to explicitly bind TDX guests to those TDX-capable NUMA nodes.

I think the important thing is we need to get consensus on which direction to go
as this is kinda related to userspace ABI AFAICT.

Kirill has some thoughts on the second option, such as we may need some
additional work to split NUMA node which contains both TDX and non-TDX memory.

I am not entirely clear how hard this work will be, but my thinking is, the
above two are not necessarily conflicting. For example, from userspace ABI's
perspective we can go option 2, but at the meantime, we still reject hotplug of
non-TDX memory. This effectively equals to reporting all nodes as TDX-capable.

Splitting NUMA nodes which contains both TDX and non-TDX memory can be enhanced
in the future as it doesn't break userspace ABI -- userspace needs to
explicitly bind TDX guests to TDX-capable nodes anyway.