Re: [PATCHv6 3/4] x86/tdx: Dynamically disable SEPT violations from causing #VEs

From: Kirill A. Shutemov
Date: Mon Sep 09 2024 - 07:30:16 EST


On Wed, Aug 28, 2024 at 05:27:32PM +0300, Nikolay Borisov wrote:
>
>
> On 28.08.24 г. 12:35 ч., Kirill A. Shutemov wrote:
> > Memory access #VEs are hard for Linux to handle in contexts like the
> > entry code or NMIs. But other OSes need them for functionality.
> > There's a static (pre-guest-boot) way for a VMM to choose one or the
> > other. But VMMs don't always know which OS they are booting, so they
> > choose to deliver those #VEs so the "other" OSes will work. That,
> > unfortunately has left us in the lurch and exposed to these
> > hard-to-handle #VEs.
> >
> > The TDX module has introduced a new feature. Even if the static
> > configuration is set to "send nasty #VEs", the kernel can dynamically
> > request that they be disabled. Once they are disabled, access to private
> > memory that is not in the Mapped state in the Secure-EPT (SEPT) will
> > result in an exit to the VMM rather than injecting a #VE.
> >
> > Check if the feature is available and disable SEPT #VE if possible.
> >
> > If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
> > attribute is no longer reliable. It reflects the initial state of the
> > control for the TD, but it will not be updated if someone (e.g. bootloader)
> > changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
> > determine if SEPT #VEs are enabled or disabled.
>
> LGTM. However 2 minor suggestions which might be worth addressing.
>
> Reviewed-by: Nikolay Borisov <nik.borisov@xxxxxxxx>
>
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > Fixes: 373e715e31bf ("x86/tdx: Panic on bad configs that #VE on "private" memory access")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Acked-by: Kai Huang <kai.huang@xxxxxxxxx>
> > ---
> > arch/x86/coco/tdx/tdx.c | 76 ++++++++++++++++++++++++-------
> > arch/x86/include/asm/shared/tdx.h | 10 +++-
> > 2 files changed, 69 insertions(+), 17 deletions(-)
> >
> > diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> > index 08ce488b54d0..f969f4f5ebf8 100644
> > --- a/arch/x86/coco/tdx/tdx.c
> > +++ b/arch/x86/coco/tdx/tdx.c
> > @@ -78,7 +78,7 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args)
> > }
> > /* Read TD-scoped metadata */
> > -static inline u64 __maybe_unused tdg_vm_rd(u64 field, u64 *value)
> > +static inline u64 tdg_vm_rd(u64 field, u64 *value)
> > {
> > struct tdx_module_args args = {
> > .rdx = field,
> > @@ -193,6 +193,62 @@ static void __noreturn tdx_panic(const char *msg)
> > __tdx_hypercall(&args);
> > }
> > +/*
> > + * The kernel cannot handle #VEs when accessing normal kernel memory. Ensure
> > + * that no #VE will be delivered for accesses to TD-private memory.
> > + *
> > + * TDX 1.0 does not allow the guest to disable SEPT #VE on its own. The VMM
> > + * controls if the guest will receive such #VE with TD attribute
> > + * ATTR_SEPT_VE_DISABLE.
> > + *
> > + * Newer TDX modules allow the guest to control if it wants to receive SEPT
> > + * violation #VEs.
> > + *
> > + * Check if the feature is available and disable SEPT #VE if possible.
> > + *
> > + * If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
> > + * attribute is no longer reliable. It reflects the initial state of the
> > + * control for the TD, but it will not be updated if someone (e.g. bootloader)
> > + * changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
> > + * determine if SEPT #VEs are enabled or disabled.
> > + */
> > +static void disable_sept_ve(u64 td_attr)
> > +{
> > + const char *msg = "TD misconfiguration: SEPT #VE has to be disabled";
> > + bool debug = td_attr & ATTR_DEBUG;
> > + u64 config, controls;
> > +
> > + /* Is this TD allowed to disable SEPT #VE */
> > + tdg_vm_rd(TDCS_CONFIG_FLAGS, &config);
> > + if (!(config & TDCS_CONFIG_FLEXIBLE_PENDING_VE)) {
>
> Should you check for the presence of those controls in in
> TDX_FEATURES0.PENDING_EPT_VIOLATION_V2 ? I.e perhaps this code can be put in
> the same function that checks the presence of RBP_NO_MOD in a different
> series by Kai Huang?

No. TDX_FEATURES0 check is not required. This bit in TDCS_CONFIG_FLAGS
cannot be anything else than FLEXIBLE_PENDING_VE and checking only this
bit is enough.

>
>
> > + /* No SEPT #VE controls for the guest: check the attribute */
> > + if (td_attr & ATTR_SEPT_VE_DISABLE)
> > + return;
>
> nit: Given that we expect most guests to actually have this attribute set
> perhaps moving this check at the top of the function will cause it exit
> early more often than not?

The attribute is not reliable source if flexible VE controls are present
as I mentioned in the commit message. We can only rely on it if there's no
TDCS_CONFIG_FLEXIBLE_PENDING_VE.

> > +
> > + /* Relax SEPT_VE_DISABLE check for debug TD for backtraces */
> > + if (debug)
> > + pr_warn("%s\n", msg);
> > + else
> > + tdx_panic(msg);
> > + return;
> > + }
> > +
> > + /* Check if SEPT #VE has been disabled before us */
> > + tdg_vm_rd(TDCS_TD_CTLS, &controls);
> > + if (controls & TD_CTLS_PENDING_VE_DISABLE)
> > + return;
> > +
> > + /* Keep #VEs enabled for splats in debugging environments */
> > + if (debug)
> > + return;
> > +
> > + /* Disable SEPT #VEs */
> > + tdg_vm_wr(TDCS_TD_CTLS, TD_CTLS_PENDING_VE_DISABLE,
> > + TD_CTLS_PENDING_VE_DISABLE);
> > +
> > + return;
> > +}
> > +
> > static void tdx_setup(u64 *cc_mask)
> > {
> > struct tdx_module_args args = {};
>
> <snip>

--
Kiryl Shutsemau / Kirill A. Shutemov