Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec

From: Tom Lendacky
Date: Mon Feb 19 2024 - 14:50:06 EST


On 2/19/24 10:16, Borislav Petkov wrote:
On Wed, Jan 31, 2024 at 11:31:53AM +0000, Huang, Kai wrote:
From: Kai Huang <kai.huang@xxxxxxxxx>

Currently on AMD SME platforms, during kexec() caches are flushed
manually before jumping to the new kernel due to memory encryption.
Intel TDX needs to flush cachelines of TDX private memory before
jumping to the second kernel too, otherwise they may silently corrupt
the new kernel.

Instead of sprinkling both AMD and Intel's specific checks around,
introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both
Intel and AMD, and simplify the logic:

Could the old kernel leave incoherent caches around?

"Is it possible that the kernel could leave caches in incoherent state?"

If so, do WBINVD.

Convert the AMD SME to use this new CC attribute.

A later patch will
utilize this new attribute for Intel TDX too.

Yeah, once those are all in git, the concept of "subsequent patch"
becomes ambiguous depending on how you're sorting them. So try to read
that commit message out of the context of all those "subsequent patches"
and see if it still makes sense.

IOW, you should strive for your commit messages to make sense on their
own, without referencing other patches.

In this particular case, that "later patch" can go.

Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu();
2) relocate_kernel(). stop_this_cpu() checks the CPUID directly to do
WBINVD for the reason that the current kernel's SME enabling status may
not match the new kernel's choice. However the relocate_kernel() only
does the WBINVD when the current kernel has enabled SME for the reason
that the new kernel is always placed in an "unencrypted" area.

To simplify the logic, for AMD SME change to always use the way that is
done in stop_this_cpu(). This will cause an additional WBINVD in
relocate_kernel() when the current kernel hasn't enabled SME (e.g.,
disabled by kernel command line), but this is acceptable for the sake of
having less complicated code (see [1] for the relevant discussion).

Note currently the kernel only advertises CC vendor for AMD SME when SME
is actually enabled by the kernel. To always advertise the new
CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling
status, change to set CC vendor as long as the hardware has enabled SME.

Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has
enabled SME" is still different from "checking the CPUID" (the way that
is done in stop_this_cpu()), but technically the former also serves the
purpose and is actually more accurate.

Such change allows sme_me_mask to be 0 while CC vendor reports as AMD.
But this doesn't impact other CC attributes on AMD platforms, nor does
it impact the cc_mkdec()/cc_mkenc().

[1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@xxxxxxx/

Suggested-by: Dave Hansen <dave.hansen@xxxxxxxxx>
Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx>
---
arch/x86/coco/core.c | 13 +++++++++++++
arch/x86/kernel/machine_kexec_64.c | 2 +-
arch/x86/kernel/process.c | 14 +++-----------
arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++-
include/linux/cc_platform.h | 15 +++++++++++++++
5 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index eeec9986570e..8d6d727e6e18 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
case CC_ATTR_HOST_MEM_ENCRYPT:
return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
+ case CC_ATTR_HOST_MEM_INCOHERENT:
+ /*
+ * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
+ * enabled on the platform regardless whether the kernel
+ * has actually enabled the SME.
+

"represents whether SME has [been] enabled ... regardless whether the
kernel has enabled SME"?!?

I think this needs to be:

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index d07be9d05cd0..e3653465e532 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -67,6 +67,13 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
switch (attr) {
case CC_ATTR_MEM_ENCRYPT:
+
+ /*
+ * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
+ * enabled on the platform regardless whether the kernel
+ * has actually enabled the SME.
+ */
+ case CC_ATTR_HOST_MEM_INCOHERENT:

This change won't return the correct answer. The check needs to remain against the sev_status value.

return sme_me_mask;
case CC_ATTR_HOST_MEM_ENCRYPT:


+ return !(sev_status & MSR_AMD64_SEV_ENABLED);
+
+ /*
+ * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask
+ * as it must be true when there's any SEV enable bit set in
+ * sev_status.
+ */

Superfluous comment.

case CC_ATTR_GUEST_MEM_ENCRYPT:
return sev_status & MSR_AMD64_SEV_ENABLED;
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index bc0a5348b4a6..c9c6974e2e9c 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image)
(unsigned long)page_list,
image->start,
image->preserve_context,
- cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));
+ cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT));
#ifdef CONFIG_KEXEC_JUMP
if (image->preserve_context)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ab49ade31b0d..2c7e8d9889c0 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy)
mcheck_cpu_clear(c);
/*
- * Use wbinvd on processors that support SME. This provides support
- * for performing a successful kexec when going from SME inactive
- * to SME active (or vice-versa). The cache must be cleared so that
- * if there are entries with the same physical address, both with and
- * without the encryption bit, they don't race each other when flushed
- * and potentially end up with the wrong entry being committed to
- * memory.
- *
- * Test the CPUID bit directly because the machine might've cleared
- * X86_FEATURE_SME due to cmdline options.
+ * Use wbinvd on processors that the first kernel *could*
+ * potentially leave incoherent cachelines.

No need for that comment anymore - people can grep for
CC_ATTR_HOST_MEM_INCOHERENT's definition simply.

*/
- if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
+ if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT))
native_wbinvd();
/*
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 7f72472a34d6..87e4fddab770 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp)
msr = __rdmsr(MSR_AMD64_SYSCFG);
if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
return;
+
+ /*
+ * Always set CC vendor when the platform has SME enabled
+ * regardless whether the kernel will actually activates the
+ * SME or not. This reports the CC_ATTR_HOST_MEM_INCOHERENT
+ * being true as long as the platform has SME enabled so that
+ * stop_this_cpu() can do necessary WBINVD during kexec().
+ */
+ cc_vendor = CC_VENDOR_AMD;
} else {
/* SEV state cannot be controlled by a command line option */
sme_me_mask = me_mask;
+ cc_vendor = CC_VENDOR_AMD;
goto out;
}

So you can't put it before the if - just slap it in both branches. Geez!

I think that will still work because sme_me_mask and sev_status will both be 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't evaluate to true. However, that will cause any platform that hasn't enabled memory encryption (see SYS_CFG MSR), to also perform the WBINVD.

The idea looks like it was to keep existing behavior where cc_vendor wasn't set if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' is false.


diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 0166ab1780cc..1e4566cc233f 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -549,6 +549,8 @@ void __init sme_enable(struct boot_params *bp)
if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED))
snp_abort();
+ cc_vendor = CC_VENDOR_AMD;
+
/* Check if memory encryption is enabled */
if (feature_mask == AMD_SME_BIT) {
/*
@@ -597,6 +599,5 @@ void __init sme_enable(struct boot_params *bp)
out:
RIP_REL_REF(sme_me_mask) = me_mask;
physical_mask &= ~me_mask;
- cc_vendor = CC_VENDOR_AMD;
cc_set_mask(me_mask);
}

Btw, pls do your patches ontop of tip/master as other patches in there
are touching this file.

@@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp)
out:
if (sme_me_mask) {
physical_mask &= ~sme_me_mask;
- cc_vendor = CC_VENDOR_AMD;
cc_set_mask(sme_me_mask);
}
}
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index cb0d6cd1c12f..2f7273596102 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -42,6 +42,21 @@ enum cc_attr {
*/
CC_ATTR_HOST_MEM_ENCRYPT,

This goes to the end of the enum.

+ /**
+ * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be
+ * incoherent

"...can leave caches in an incoherent state."

+ *
+ * The platform/OS is running as a bare-metal system or a hypervisor.
+ * The memory encryption engine might have left non-cache-coherent
+ * data in the caches that needs to be flushed.
+ *
+ * Use this in places where the cache coherency of the memory matters
+ * but the encryption status does not.
+ *
+ * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT.

If that is the case, why do you even need a new CC_ATTR define?

Might as well do:

if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
native_wbinvd();

That won't work, because the current system may not have SME active. The cases that needs to be caught are kexec'ing from a mem_encrypt=off to a mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off.

For the first case, you need to determine if the system is SME capable, because cc_platform_has(CC_ATTR_MEM_ENCRYPT) will return false if SME is not active.

Thanks,
Tom


?

+ */
+ CC_ATTR_HOST_MEM_INCOHERENT,
+