Re: [PATCH v13 20/22] x86/kexec(): Reset TDX private memory on platforms with TDX erratum

From: Huang, Kai
Date: Fri Sep 15 2023 - 07:43:05 EST


On Thu, 2023-09-14 at 21:36 +0000, Edgecombe, Rick P wrote:
> On Sat, 2023-08-26 at 00:14 +1200, Kai Huang wrote:
> > diff --git a/arch/x86/kernel/machine_kexec_64.c
> > b/arch/x86/kernel/machine_kexec_64.c
> > index 1a3e2c05a8a5..03d9689ef808 100644
> > --- a/arch/x86/kernel/machine_kexec_64.c
> > +++ b/arch/x86/kernel/machine_kexec_64.c
> > @@ -28,6 +28,7 @@
> >  #include <asm/setup.h>
> >  #include <asm/set_memory.h>
> >  #include <asm/cpu.h>
> > +#include <asm/tdx.h>
> >  
> >  #ifdef CONFIG_ACPI
> >  /*
> > @@ -301,6 +302,14 @@ void machine_kexec(struct kimage *image)
> >         void *control_page;
> >         int save_ftrace_enabled;
> >  
> > +       /*
> > +        * For platforms with TDX "partial write machine check"
> > erratum,
> > +        * all TDX private pages need to be converted back to normal
> > +        * before booting to the new kernel, otherwise the new kernel
> > +        * may get unexpected machine check.
> > +        */
> > +       tdx_reset_memory();
> > +
> >  #ifdef CONFIG_KEXEC_JUMP
> >         if (image->preserve_context)
> >                 save_processor_state();
>
> Without a ton of knowledge on TDX arch stuff, I'm mostly looked at the
> kexec flow with respect to anything that might be tinkering with the
> PAMT. Everything there looked good to me.
>
> But I'm wondering if you want to skip the tdx_reset_memory() in the
> KEXEC_JUMP/preserve_context case. Somehow (I'm not clear on all the
> details), kexec can be configured to have the new kernel jump back to
> the old kernel and resume execution as if nothing happened. Then I
> think you would want to keep the TDX data around. Does that make any
> sense?
>

Good point. Thanks!

Based on my understanding, it should be OK to skip tdx_reset_memory() (or better
to) when preserve_context is on. The second kernel shouldn't touch first
kernel's memory anyway otherwise it may corrupt the first kernel state (if it
does this maliciously or accidentally, then the first kernel isn't guaranteed to
work anyway).  

In fact, if we do tdx_reset_memory() when preserve_memory is on, we will need to
do additional things to mark TDX as dead otherwise after jumping back other
kernel code will still believe TDX is alive and continue to use TDX.

I'll do this if I don't hear objection from other people.  

Something like below?

diff --git a/arch/x86/kernel/machine_kexec_64.c
b/arch/x86/kernel/machine_kexec_64.c
index 03d9689ef808..73ed01360408 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -307,12 +307,18 @@ void machine_kexec(struct kimage *image)
* all TDX private pages need to be converted back to normal
* before booting to the new kernel, otherwise the new kernel
* may get unexpected machine check.
+ *
+ * But skip this when preserve_context is on. The second kernel
+ * shouldn't touch the first kernel's memory anyway. Skipping
+ * this also avoids killing TDX in the first kernel, which would
+ * require more complicated handling.
*/
- tdx_reset_memory();
-
#ifdef CONFIG_KEXEC_JUMP
if (image->preserve_context)
save_processor_state();
+ else
+#else
+ tdx_reset_memory();
#endif