Re: [ANNOUNCE] PUCK Notes - 2024.04.03 - TDX Upstreaming Strategy

From: Xiaoyao Li
Date: Mon Apr 08 2024 - 22:58:13 EST


On 4/9/2024 12:20 AM, Sean Christopherson wrote:
On Sun, Apr 07, 2024, Xiaoyao Li wrote:
On 4/6/2024 12:58 AM, Sean Christopherson wrote:
- For guest MAXPHYADDR vs. GPAW, rely on KVM_GET_SUPPORTED_CPUID to enumerate
the usable MAXPHYADDR[2], and simply refuse to enable TDX if the TDX Module
isn't compatible. Specifically, if MAXPHYADDR=52, 5-level paging is enabled,
but the TDX-Module only allows GPAW=0, i.e. only supports 4-level paging.

So userspace can get supported GPAW from usable MAXPHYADDR, i.e.,
CPUID(0X8000_0008).eaxx[23:16] of KVM_GET_SUPPORTED_CPUID:
- if usable MAXPHYADDR == 52, supported GPAW is 0 and 1.
- if usable MAXPHYADDR <= 48, supported GPAW is only 0.

There is another thing needs to be discussed. How does userspace configure
GPAW for TD guest?

Currently, KVM uses CPUID(0x8000_0008).EAX[7:0] in struct
kvm_tdx_init_vm::cpuid.entries[] of IOCTL(KVM_TDX_INIT_VM) to deduce the
GPAW:

int maxpa = 36;
entry = kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, 0x80000008, 0);
if (entry)
max_pa = entry->eax & 0xff;

...
if (!cpu_has_vmx_ept_5levels() && max_pa > 48)
return -EINVAL;
if (cpu_has_vmx_ept_5levels() && max_pa > 48) {
td_params->eptp_controls |= VMX_EPTP_PWL_5;
td_params->exec_controls |= TDX_EXEC_CONTROL_MAX_GPAW;
} else {
td_params->eptp_controls |= VMX_EPTP_PWL_4;
}

The code implies that KVM allows the provided CPUID(0x8000_0008).EAX[7:0] to
be any value (when 5level ept is supported). when it > 48, configure GPAW of
TD to 1, otherwise to 0.

However, the virtual value of CPUID(0x8000_0008).EAX[7:0] inside TD is
always the native value of hardware (for current TDX).

So if we want to keep this behavior, we need to document it somewhere that
CPUID(0x8000_0008).EAX[7:0] in struct kvm_tdx_init_vm::cpuid.entries[] of
IOCTL(KVM_TDX_INIT_VM) is only for configuring GPAW, not for userspace to
configure virtual CPUID value for TD VMs.

Another option is that, KVM doesn't allow userspace to configure
CPUID(0x8000_0008).EAX[7:0]. Instead, it provides a gpaw field in struct
kvm_tdx_init_vm for userspace to configure directly.

What do you prefer?

Hmm, neither. I think the best approach is to build on Gerd's series to have KVM
select 4-level vs. 5-level based on the enumerated guest.MAXPHYADDR, not on
host.MAXPHYADDR.

I see no difference between using guest.MAXPHYADDR (EAX[23:16]) and using host.MAXPHYADDR (EAX[7:0]) to determine the GPAW (and EPT level) for TD guest. The case for TDX diverges from what for non TDX VMs. The value of them passed from userspace can only be used to configure GPAW and EPT level for TD, but won't be reflected in CPUID inside TD.

So I take it as you prefer the former option than dedicated GPAW field.

With a moderate amount of refactoring, cache/compute guest_maxphyaddr as:

static void kvm_vcpu_refresh_maxphyaddr(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;

best = kvm_find_cpuid_entry(vcpu, 0x80000000);
if (!best || best->eax < 0x80000008)
goto not_found;

best = kvm_find_cpuid_entry(vcpu, 0x80000008);
if (!best)
goto not_found;

vcpu->arch.maxphyaddr = best->eax & GENMASK(7, 0);

if (best->eax & GENMASK(15, 8))
vcpu->arch.guest_maxphyaddr = (best->eax & GENMASK(15, 8)) >> 8;
else
vcpu->arch.guest_maxphyaddr = vcpu->arch.maxphyaddr;

return;

not_found:
vcpu->arch.maxphyaddr = KVM_X86_DEFAULT_MAXPHYADDR;
vcpu->arch.guest_maxphyaddr = KVM_X86_DEFAULT_MAXPHYADDR;
}

and then use vcpu->arch.guest_maxphyaddr instead of vcpu->arch.maxphyaddr when
selecting the TDP level.

static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
{
/* tdp_root_level is architecture forced level, use it if nonzero */
if (tdp_root_level)
return tdp_root_level;

/*
* Use 5-level TDP if and only if it's useful/necessary. Definitely a
* more verbose comment here.
*/
if (max_tdp_level == 5 && vcpu->arch.guest_maxphyaddr <= 48)
return 4;

return max_tdp_level;
}

The only question is whether or not the behavior needs to be opt-in via a new
capability, e.g. in case there is some weird usage where userspace enumerates
guest.MAXPHYADDR < host.MAXPHYADDR but still wants/needs 5-level paging. I highly
doubt such a use case exists though.

I'll get Gerd's series applied, and will post a small series to implement the
above later this week.