Re: [PATCH v3 00/21] TDX host kernel support
From: Kai Huang
Date: Tue May 10 2022 - 06:25:24 EST
> > >
> > > >
> > > > Consider the fact that end users can break the kernel by specifying
> > > > invalid memmap= command line options. The memory hotplug code does not
> > > > take any steps to add safety in those cases because there are already
> > > > too many ways it can go wrong. TDX is just one more corner case where
> > > > the memmap= user needs to be careful. Otherwise, it is up to the
> > > > platform firmware to make sure everything in the base memory map is
> > > > TDX capable, and then all you need is documentation about the failure
> > > > mode when extending "System RAM" beyond that baseline.
> > >
> > > So the fact is, if we don't include legacy PMEMs into TDMRs, and don't do
> > > anything in memory hotplug, then if user does kmem-hot-add legacy PMEMs as
> > > system RAM, a live TD may eventually be killed.
> > >
> > > If such case is a corner case that we don't need to guarantee, then even better.
> > > And we have an additional reason that those legacy PMEMs don't need to be in
> > > TDMRs. As you suggested, we can add some documentation to point out.
> > >
> > > But the point we want to do some code check and prevent memory hotplug is, as
> > > Dave said, we want this piece of code to work on *ANY* TDX capable machines,
> > > including future machines which may, i.e. supports NVDIMM/CLX memory as TDX
> > > memory. If we don't do any code check in memory hotplug in this series, then
> > > when this code runs in future platforms, user can plug NVDIMM or CLX memory as
> > > system RAM thus break the assumption "all pages in page allocator are TDX
> > > memory", which eventually leads to live TDs being killed potentially.
> > >
> > > Dave said we need to guarantee this code can work on *ANY* TDX machines. Some
> > > documentation saying it only works one some platforms and you shouldn't do
> > > things on other platforms are not good enough:
> > >
> > > https://lore.kernel.org/lkml/cover.1649219184.git.kai.huang@xxxxxxxxx/T/#m6df45b6e1702bb03dcb027044a0dabf30a86e471
> >
> > Yes, the incompatible cases cannot be ignored, but I disagree that
> > they actively need to be prevented. One way to achieve that is to
> > explicitly enumerate TDX capable memory and document how mempolicy can
> > be used to avoid killing TDs.
>
> Hi Dan,
>
> Thanks for feedback.
>
> Could you elaborate what does "explicitly enumerate TDX capable memory" mean?
> How to enumerate exactly?
>
> And for "document how mempolicy can be used to avoid killing TDs", what
> mempolicy (and error reporting you mentioned below) are you referring to?
>
> I skipped to reply your below your two replies as I think they are referring to
> the same "enumerate" and "mempolicy" that I am asking above.
>
>
Hi Dan,
I guess "explicitly enumerate TDX capable memory" means getting the Convertible
Memory Regions (CMR). And "document how mempolicy can be used to avoid killing
TDs" means we say something like below in the documentation?
Any non TDX capable memory hot-add will result in non TDX capable pages
being potentially allocated to a TD, in which case a TD may fail to be
created or a live TD may be killed at runtime.
And "error reporting" do you mean in memory hot-add code path, we check whether
the new memory resource is TDX capable, if not we print some error similar to
above message in documentation, but still allow the memory hot-add to happen?
Something like below in add_memory_resource()?
if (platform_has_tdx() && new memory resource NOT in CMRs)
pr_err("Hot-add non-TDX memory on TDX capable system. TD may
fail to be created, or a live TD may be killed during
runtime.\n");
// allow memory hot-add anyway
I have below concerns of this approach:
1) I think we should provide a consistent service to user, that is, we either to
guarantee that TD won't be failed to be created randomly and a running TD won't
be killed during runtime, or we don't provide any TDX functionality at all. So
I am not sure only "document how mempolicy can be use to avoid killing TDs" is
good enough.
2) Above code to check whether a new memory resource is in CMRs or not requires
the kernel to get CMRs during kernel boot. However getting CMRs requires
calling SEAMCALL which requires kernel to support VMXON/VMXOFF. VMXON/VMXOFF is
currently only handled by KVM. We'd like to avoid adding VMXON/VMXOFF to core-
kernel now if not mandatory, as eventually we will very likely need to have a
reference-based approach to call VMXON/VMXOFF. This part is explained in the
cover letter in this series.
Dave suggested for now to keep things simple, we can use "winner take all"
approach: If TDX is initialized first, don't allow memory hotplug. If memory
hotplug happens first, don't allow TDX to be initialized.
https://lore.kernel.org/lkml/cover.1649219184.git.kai.huang@xxxxxxxxx/T/#mfa6b5dcc536d8a7b78522f46ccd1230f84d52ae0
I think this is perhaps more reasonable as we are at least providing some
consistent service to user. And in this approach we don't need to handle
VMXON/VMXOFF in core-kernel.
Comments?
--
Thanks,
-Kai