Re: [PATCH] x86: Disable kexec for TDX guests

From: Baoquan He
Date: Tue Mar 28 2023 - 00:26:24 EST


On 03/27/23 at 02:09pm, Kirill A. Shutemov wrote:
> On Mon, Mar 27, 2023 at 09:35:54AM +0800, Baoquan He wrote:
> > On 03/26/23 at 10:01am, Dave Hansen wrote:
> > > On 3/25/23 12:25, Kirill A. Shutemov wrote:
> > > > On Sat, Mar 25, 2023 at 09:25:36AM -0700, Dave Hansen wrote:
> > > >> On 3/25/23 09:01, Kirill A. Shutemov wrote:
> > > >>> The last item is tricky. TDX guests use ACPI MADT MPWK to bring up
> > > >>> secondary CPUs. The mechanism doesn't allow to put a CPU back offline if
> > > >>> it has woken up.
> > > >> ...
> > > >>> +int arch_kexec_load(void)
> > > >>> +{
> > > >>> + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
> > > >>> + pr_warn_once("Disable kexec: not yet supported in TDX guest\n");
> > > >>> + return -EOPNOTSUPP;
> > > >>> + }
> > > >>> +
> > > >>> + return 0;
> > > >>> +}
> > > >>
> > > >> So, let's put all this together:
> > > >>
> > > >> 1. TDX implementations use MADT for wakeup exclusively right now (but
> > > >> are not necessarily _required_ to do so forever)
> > > >> 2. MADT doesn't support CPU offlining
> > > >> 3. kexec() requires offlining
> > > >>
> > > >> Thus, current TDX implementations can't support TDX guests. This
> > > >> *doesn't* say that TDX will always use the MADT for wakeups.
> > > >>
> > > >> Yet, the check you have here is for TDX and *not* for the MADT.
> > > >
> > > > As I described in the commit message there are more than MADT that is
> > > > required to get kexec in TDX guest.
> > >
> > > I kinda think we should do both.
> > >
> > > Let's make sure that all systems that depend on MADT wakeups can't
> > > kexec() until the ACPI folks work out what to do there.
> > >
> > > Separately, let's either fix or *mark* the kexec()-incompatible pieces
> > > that *ARE* specific to TDX.
> > >
> > > >> That seems wrong.
> > > >>
> > > >> Let's say SEV or arm64 comes along and uses the MADT for their guests.
> > > >> They'll add another arch_kexec_load(), with a check for *their* feature.
> > > >>
> > > >> This all seems like you should be disabling kexec() the moment the MADT
> > > >> CPU wakeup is used instead of making it based on TDX.
> > > >
> > > > I guess we can go this path if you are fine with taking CR4.MCE and shared
> > > > memory reverting patches (they require some rework, but I can get them
> > > > into shape quickly). After that we can forbid kexec on machines with MADT
> > > > if nr_cpus > 1.
> > >
> > > This goes back to what I asked before: is anyone actually going to *use*
> > > a single-processor system that wants to kexec()? If not, let's not
> > > waste the time to introduce code that is just going to bitrot. Just
> > > mark it broken and move on with life.
> >
> > Now we have two API for kexec: kexec_load and kexec_file_load. They can
> > be used to do kexec reboot, or crash dumping. For crash dumping, we
> > usually only use one cpu to do the vmcore dumping. At least on our
> > Fedora/centos-stream/RHEL, we do like this with kernel parameter
> > 'nr_cpus=1' added by default. Unless people explicitly remove the
> > 'nr_cpus=1' restriction or set nr_cpus= to other number to persue
> > multithread dumping in kdump kernel.
>
> Hm. I'm not sure how to determine if the target kernel wants to use >1
> CPU. Scanning cmdline looks fragile. And who said the target kernel is
> Linux.

Ah, I forgot the checking and disabling is done in 1st kernel, it's truly
not convinent to get 2nd kernel's cmdline. Then disabling kexec on TDX
guest for the time being looks resonable to me.

>
> I guess we can park all CPUs, but CPU0 and target kernel will just fail to
> bring them up which is non-fatal issue (at least for Linux).
>
> I admit that all looks hackish.
>
> --
> Kiryl Shutsemau / Kirill A. Shutemov
>