Re: [PATCH v5 03/16] x86/tdx: Exclude Shared bit from physical_mask

From: Sean Christopherson
Date: Fri Nov 05 2021 - 18:11:55 EST


On Fri, Oct 08, 2021, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
>
> Just like MKTME, TDX reassigns bits of the physical address for
> metadata. MKTME used several bits for an encryption KeyID. TDX
> uses a single bit in guests to communicate whether a physical page
> should be protected by TDX as private memory (bit set to 0) or
> unprotected and shared with the VMM (bit set to 1).
>
> Add a helper, tdx_shared_mask() to generate the mask. The processor
> enumerates its physical address width to include the shared bit, which
> means it gets included in __PHYSICAL_MASK by default.

This is incorrect. The shared bit _may_ be a legal PA bit, but AIUI it's not a
hard requirement.

> Remove the shared mask from 'physical_mask' since any bits in
> tdx_shared_mask() are not used for physical addresses in page table
> entries.

...

> @@ -94,6 +100,9 @@ static void tdx_get_info(void)
>
> td_info.gpa_width = out.rcx & GENMASK(5, 0);
> td_info.attributes = out.rdx;
> +
> + /* Exclude Shared bit from the __PHYSICAL_MASK */
> + physical_mask &= ~tdx_shared_mask();

This is insufficient, though it's not really the fault of this patch, the specs
themselves botch this whole thing.

The TDX Module spec explicitly states that GPAs above GPAW are considered reserved.

10.11.1. GPAW-Relate EPT Violations
GPA bits higher than the SHARED bit are considered reserved and must be 0.
Address translation with any of the reserved bits set to 1 cause a #PF with
PFEC (Page Fault Error Code) RSVD bit set.

But this is contradicted by the architectural extensions spec, which states that
a GPA that satisfies MAXPA >= GPA > GPAW "can" cause an EPT violation, not #PF.
Note, this section also appears to have a bug, as it states that GPA bit 47 is
both the SHARED bit and reserved. I assume that blurb is intended to clarify
that bit 47 _would_ be reserved if it weren't the SHARED bit, but because it's
the shared bit it's ok to access.

1.4.2
Guest Physical Address Translation
If the CPU's maximum physical-address width (MAXPA) is 52 and the guest physical
address width is configured to be 48, accesses with GPA bits 51:48 not all being
0 can cause an EPT-violation, where such EPT-violations are not mutated to #VE,
even if the “EPT-violations #VE” execution control is 1.

If the CPU's physical-address width (MAXPA) is less than 48 and the SHARED bit
is configured to be in bit position 47, GPA bit 47 would be reserved, and GPA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bits 46:MAXPA would be reserved. On such CPUs, setting bits 51:48 or bits
46:MAXPA in any paging structure can cause a reserved bit page fault on access.

The Module spec also calls out that the effective GPA is not to be confused with
MAXPA, which combined with the above blurb about MAXPA < GPAW, suggests that MAXPA
is enumerated separately by design so that the guest doesn't incorrectly think
46:MAXPA are usable. But that is problematic for the case where MAXPA > GPAW.

The effective GPA width (in bits) for this TD (do not confuse with MAXPA).
SHARED bit is at GPA bit GPAW-1.

I can't find the exact reference, but the TDX module always passes through host's
MAXPHYADDR. As it pertains to this patch, just doing

physical_mask &= ~tdx_shared_mask()

means that a guest running with GPAW=0 and MAXPHYADDR>48 will have a discontiguous
physical_mask, and could access "reserved" memory. If the VMM defines legal memory
with bits [MAXPHYADDR:48]!=0, explosions may ensue. That's arguably a VMM bug, but
given that the VMM is untrusted I think the guest should be paranoid when handling
the SHARED bit. I also don't know that the kernel will play nice with a discontiguous
mask.

Specs aside, unless Intel makes a hardware change to treat GPAW as guest.MAXPHYADDR,
or the TDX Module emulates on EPT violations to inject #PF(RSVD) when appropriate,
this mess isn't going to be truly fixed from the guest perspective.

So, IMO all bits >= GPAW should be cleared, and the kernel should warn and/or
refuse to boot if the host has defined legal memory in that range.

FWIW, from a VMM perspective, I'm pretty sure the only sane approach is to force
GPAW=1, a.k.a. SHARED bit == 51, if host.MAXPHYADDR>=49. But on the guest side,
I think we should be paranoid.