Re: [PATCH v3 2/5] x86/kexec: do unconditional WBINVD for bare-metal in relocate_kernel()

From: Tom Lendacky
Date: Thu Apr 11 2024 - 10:26:03 EST


On 4/10/24 17:55, Huang, Kai wrote:
On 11/04/2024 4:21 am, Tom Lendacky wrote:
On 4/7/24 07:44, Kai Huang wrote:

@@ -160,9 +160,13 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
      movq    %r9, %cr3
      /*
-     * If SME is active, there could be old encrypted cache line
-     * entries that will conflict with the now unencrypted memory
-     * used by kexec. Flush the caches before copying the kernel.
+     * The kernel could leave caches in incoherent state on SME/TDX
+     * capable platforms.  Just do unconditional WBINVD to avoid
+     * silent memory corruption to the new kernel for these platforms.
+     *
+     * But only do WBINVD for bare-metal because TDX guests and
+     * SEV-ES/SEV-SNP guests will get #VE which the kernel is unable
+     * to handle at this stage.

Similar comment here about doing an unconditional WBINVD, but then qualifying that as only on guests. This is where talking about how exception handling has been torn down would be good.


OK.

Thinking again, also it might be a good idea to not lose the existing comment for SME, because it somehow justifies why we do WBINVD _HERE_ I suppose?

How about below?

    /*
     * The kernel could leave caches in incoherent state on SME/TDX
     * capable platforms.  Flush cache to avoid silent memory
     * corruption for these platforms.
     *
     * For SME, need to flush cache here before copying the kernel.
     * When it is active, there could be old encrypted cache line
     * entries that will conflict with the now unencrypted memory
     * used by kexec.
     *
     * Do WBINVD for bare-metal to cover both SME and TDX, as it's
     * not safe to do WBINVD for TDX and SEV-ES/SEV-SNP guests.
     * WBINVD results in exception (#VE or #VC) for these guests, and
     * at this stage kernel is not able to handle such exception any
     * more because the kernel has torn down IDT.
     */

Similar to my comment in the other patch, just adding something that indicates that the WBINVD isn't necessary for TDX and SEV-ES/SEV-SNP (and maybe guests in general) would help. Maybe something like:

* Do WBINVD for bare-metal only to cover both SME and TDX. It
* isn't necessary to perform a WBINVD in a guest and performing
* one could result in an exception (#VE or #VC) for a TDX or
* SEV-ES/SEV-SNP guest that can crash the guest since, at this
* stage, the kernel has torn down the IDT.

This is all just my opinion, though, and others may read it differently. Probably not worth bike-shedding it.

Thanks,
Tom