Re: [RFC v2-fix-v2 1/1] x86/boot: Avoid #VE during boot for TDX platforms

From: Kuppuswamy, Sathyanarayanan
Date: Fri May 21 2021 - 14:31:36 EST




On 5/21/21 9:11 AM, Dave Hansen wrote:
Avoid operations which will inject #VE during boot process.
They're easy to avoid and it is less complex than handling
the exceptions.

This puts the solution before the problem. I'd also make sure to
clearly connect this solution to the problem. For instance, if you
refer to register "modification", ensure that you reflect that language
here. Don't call them "modifications" in one part of the changelog and
"operations" here. I'd also qualify them as "superfluous".

Please reorder this in the following form:

1. Background
2. Problem
3. Solution

Please do this for all of your patches.

There are a few MSRs and control register bits which the
kernel normally needs to modify during boot. But, TDX
disallows modification of these registers to help provide
consistent security guarantees ( and avoid generating #VE
when updating them).

No, the TDX architecture does not avoid generating #VE. The *kernel*
does that. This sentence conflates those two things.

Fortunately, TDX ensures that these are
all in the correct state before the kernel loads, which means
the kernel has no need to modify them.

The conditions we need to avoid are:

* Any writes to the EFER MSR
* Clearing CR0.NE
* Clearing CR3.MCE

Sathya, there have been repeated issues in your changelogs with "we's".
Remember, speak in imperative voice. Please fix this in your tooling
to find these so that reviewers don't have to.

How about the following commit log?

In TDX guests, Virtualization Exceptions (#VE) are delivered
to TDX guests due to specific guest actions like MSR writes,
CPUID leaf accesses or I/O access. But in early boot code, #VE
cannot be allowed because the required exception handler setup
support code is missing. If #VE is triggered without proper
handler support, it would lead to triple fault or kernel hang.
So, avoid operations which will inject #VE during boot process.
They're easy to avoid and it is less complex than handling the
exceptions.

There are a few MSRs and control register bits which the kernel
normally needs to modify during boot. But, TDX disallows
modification of these registers to help provide consistent
security guarantees. Fortunately, TDX ensures that these are all
in the correct state before the kernel loads, which means the
kernel has no need to modify them.

The conditions to avoid are:

* Any writes to the EFER MSR
* Clearing CR0.NE
* Clearing CR3.MCE

If above conditions are not avoided, it would lead to triple
fault or kernel hang.


+ /*
+ * Preserve current value of EFER for comparison and to skip
+ * EFER writes if no change was made (for TDX guest)
+ */
+ movl %eax, %edx
btsl $_EFER_SCE, %eax /* Enable System Call */
btl $20,%edi /* No Execute supported? */
jnc 1f
btsl $_EFER_NX, %eax
btsq $_PAGE_BIT_NX,early_pmd_flags(%rip)
-1: wrmsr /* Make changes effective */
+ /* Avoid writing EFER if no change was made (for TDX guest) */
+1: cmpl %edx, %eax
+ je 1f
+ xor %edx, %edx
+ wrmsr /* Make changes effective */
+1:

Just curious, but what if this goes wrong? Say the TDX firmware didn't
set up EFER correctly and this code does the WRMSR. What ends up
happening?

It would lead to triple fault.

Do we get anything out on the console, or is it essentially
undebuggable?


We can still get logs with debug TDX module. So it is still debugable.


+ /*
+ * Skip writing to EFER if the register already has desiered
+ * value (to avoid #VE for TDX guest).
+ */


spelling ^

There are lots of editors that can do spell checking, even in C
comments. You might want to look into that for your editor.


--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer