Re: [PATCH v3 00/21] TDX host kernel support
From: Kai Huang
Date: Wed May 04 2022 - 18:53:50 EST
On Wed, 2022-05-04 at 07:31 -0700, Dan Williams wrote:
> On Tue, May 3, 2022 at 4:59 PM Kai Huang <kai.huang@xxxxxxxxx> wrote:
> >
> > On Fri, 2022-04-29 at 13:40 +1200, Kai Huang wrote:
> > > On Thu, 2022-04-28 at 12:58 +1200, Kai Huang wrote:
> > > > On Wed, 2022-04-27 at 17:50 -0700, Dave Hansen wrote:
> > > > > On 4/27/22 17:37, Kai Huang wrote:
> > > > > > On Wed, 2022-04-27 at 14:59 -0700, Dave Hansen wrote:
> > > > > > > In 5 years, if someone takes this code and runs it on Intel hardware
> > > > > > > with memory hotplug, CPU hotplug, NVDIMMs *AND* TDX support, what happens?
> > > > > >
> > > > > > I thought we could document this in the documentation saying that this code can
> > > > > > only work on TDX machines that don't have above capabilities (SPR for now). We
> > > > > > can change the code and the documentation when we add the support of those
> > > > > > features in the future, and update the documentation.
> > > > > >
> > > > > > If 5 years later someone takes this code, he/she should take a look at the
> > > > > > documentation and figure out that he/she should choose a newer kernel if the
> > > > > > machine support those features.
> > > > > >
> > > > > > I'll think about design solutions if above doesn't look good for you.
> > > > >
> > > > > No, it doesn't look good to me.
> > > > >
> > > > > You can't just say:
> > > > >
> > > > > /*
> > > > > * This code will eat puppies if used on systems with hotplug.
> > > > > */
> > > > >
> > > > > and merrily await the puppy bloodbath.
> > > > >
> > > > > If it's not compatible, then you have to *MAKE* it not compatible in a
> > > > > safe, controlled way.
> > > > >
> > > > > > > You can't just ignore the problems because they're not present on one
> > > > > > > version of the hardware.
> > > > >
> > > > > Please, please read this again ^^
> > > >
> > > > OK. I'll think about solutions and come back later.
> > > > >
> > >
> > > Hi Dave,
> > >
> > > I think we have two approaches to handle memory hotplug interaction with the TDX
> > > module initialization.
> > >
> > > The first approach is simple. We just block memory from being added as system
> > > RAM managed by page allocator when the platform supports TDX [1]. It seems we
> > > can add some arch-specific-check to __add_memory_resource() and reject the new
> > > memory resource if platform supports TDX. __add_memory_resource() is called by
> > > both __add_memory() and add_memory_driver_managed() so it prevents from adding
> > > NVDIMM as system RAM and normal ACPI memory hotplug [2].
> >
> > Hi Dave,
> >
> > Try to close how to handle memory hotplug. Any comments to below?
> >
> > For the first approach, I forgot to think about memory hot-remove case. If we
> > just reject adding new memory resource when TDX is capable on the platform, then
> > if the memory is hot-removed, we won't be able to add it back. My thinking is
> > we still want to support memory online/offline because it is purely in software
> > but has nothing to do with TDX. But if one memory resource can be put to
> > offline, it seems we don't have any way to prevent it from being removed.
> >
> > So if we do above, on the future platforms when memory hotplug can co-exist with
> > TDX, ACPI hot-add and kmem-hot-add memory will be prevented. However if some
> > memory is hot-removed, it won't be able to be added back (even it is included in
> > CMR, or TDMRs after TDX module is initialized).
> >
> > Is this behavior acceptable? Or perhaps I have misunderstanding?
>
> Memory online at boot uses similar kernel paths as memory-online at
> run time, so it sounds like your question is confusing physical vs
> logical remove. Consider the case of logical offline then re-online
> where the proposed TDX sanity check blocks the memory online, but then
> a new kernel is kexec'd and that kernel again trusts the memory as TD
> convertible again just because it onlines the memory in the boot path.
> For physical memory remove it seems up to the platform to block that
> if it conflicts with TDX, not for the kernel to add extra assumptions
> that logical offline / online is incompatible with TDX.
Hi Dan,
No we don't block memory online, but we block memory add. The code I mentioned
is add_memory_resource(), while memory online code path is
memory_block_online(). Or am I wrong?
--
Thanks,
-Kai