Re: AMD Memory encryption vs. kexec

From: Tom Lendacky
Date: Wed Nov 29 2023 - 15:54:37 EST


On 11/29/23 14:01, Dave Hansen wrote:
On 11/28/23 06:03, Tom Lendacky wrote:
...
By my reading, the CC_ATTR_HOST_MEM_ENCRYPT is basically a check for
whether the current kernel has enabled SME but not SEV while the
stop_this_cpu() site is driven purely by whether the hardware *supports*
SME.

The whole supposed reason stop_this_cpu() checks CPUID directly is that
the current kernel SME/SEV enabling might not match the _next_ kernel's
enabling choices.

Correct.

So, why is a _current_ kernel check OK for relocate_kernel(), but not OK
for stop_this_cpu()?

The relocate_kernel() check provides an indication of whether SME is
actually active. The kexec kernel is placed in unencrypted memory to
match how the system was booted - where the kernel is loaded into
unencrypted memory and then encrypted in-place if SME is desired
(mem_encrypt=on). Since the kexec kernel will be unencrypted, the
cc_platform_has() call is used to indicate whether to perform a wbinvd
to remove encrypted cache line entries. If SME is not active, then there
is no need to flush caches prior to booting the kexec kernel.

Ahh, so that wbinvd is truly specific to kexec. It protects the
always-unencrypted kexec area from being zapped by encrypted lines. It
isn't necessary when the old kexec kernel is mem_encrypt=off because the
unencrypted old kernel matches the always unencrypted kexec area.

What I was worried about was the _larger_ case. Not the kexec area, the
*rest* of memory. But I think that's irrelevant because there's yet
*another* wbinvd in __enc_copy() that is will flush the rest of memory
when going from mem_encrypt=off=>on.

Correct (I was actually sitting here before I got your email wondering if I should reply to my previous email with just that info).


I'd like to propose a simplification. Let's add a
CC_ATTR_HOST_MEM_INCOHERENT. That bit gets set on all hardware that
needs WBVINDs at kexec. On AMD, it can use the stop_this_cpu() logic.
This will cause an additional wbinvd in case where a mem_encrypt=off
kernel is kexec'ing.

We can also set it on any TDX-enabled Intel hardware.

That leads to very simple logic at kexec:

Could the old kernel leave incoherent caches
around? If so, do WBINVD.

That logic gets applied to all CPUs, both boot and secondary. It
applies to all the SME-only systems (currently CC_ATTR_HOST_MEM_ENCRYPT)
and also all TDX systems. It would not depend on the current kernel's
SME enabling and it would allow both kexec-related sites to share the
same logic.

I don't really like the idea of yet another CC_ATTR_HOST_MEM_INCOHERENT
bit, but I do think it's better than adding some TDX-specific paths.

I'm good with that change. I think an additional WBINVD during kexec is acceptable to make everything less complicated in the code.

Thanks,
Tom