Re: [PATCH 01/25] KVM: TDX: Add placeholders for TDX VM/vCPU structures

From: Paolo Bonzini
Date: Tue Sep 10 2024 - 12:00:47 EST


On 8/13/24 00:47, Rick Edgecombe wrote:
From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>

Add TDX's own VM and vCPU structures as placeholder to manage and run
TDX guests. Also add helper functions to check whether a VM/vCPU is
TDX or normal VMX one, and add helpers to convert between TDX VM/vCPU
and KVM VM/vCPU.

TDX protects guest VMs from malicious host. Unlike VMX guests, TDX
guests are crypto-protected. KVM cannot access TDX guests' memory and
vCPU states directly. Instead, TDX requires KVM to use a set of TDX
architecture-defined firmware APIs (a.k.a TDX module SEAMCALLs) to
manage and run TDX guests.

In fact, the way to manage and run TDX guests and normal VMX guests are
quite different. Because of that, the current structures
('struct kvm_vmx' and 'struct vcpu_vmx') to manage VMX guests are not
quite suitable for TDX guests. E.g., the majority of the members of
'struct vcpu_vmx' don't apply to TDX guests.

Introduce TDX's own VM and vCPU structures ('struct kvm_tdx' and 'struct
vcpu_tdx' respectively) for KVM to manage and run TDX guests. And
instead of building TDX's VM and vCPU structures based on VMX's, build
them directly based on 'struct kvm'.

As a result, TDX and VMX guests will have different VM size and vCPU
size/alignment.

Currently, kvm_arch_alloc_vm() uses 'kvm_x86_ops::vm_size' to allocate
enough space for the VM structure when creating guest. With TDX guests,
ideally, KVM should allocate the VM structure based on the VM type so
that the precise size can be allocated for VMX and TDX guests. But this
requires more extensive code change. For now, simply choose the maximum
size of 'struct kvm_tdx' and 'struct kvm_vmx' for VM structure
allocation for both VMX and TDX guests. This would result in small
memory waste for each VM which has smaller VM structure size but this is
acceptable.

For simplicity, use the same way for vCPU allocation too. Otherwise KVM
would need to maintain a separate 'kvm_vcpu_cache' for each VM type.

Note, updating the 'vt_x86_ops::vm_size' needs to be done before calling
kvm_ops_update(), which copies vt_x86_ops to kvm_x86_ops. However this
happens before TDX module initialization. Therefore theoretically it is
possible that 'kvm_x86_ops::vm_size' is set to size of 'struct kvm_tdx'
(when it's larger) but TDX actually fails to initialize at a later time.

Again the worst case of this is wasting couple of bytes memory for each
VM. KVM could choose to update 'kvm_x86_ops::vm_size' at a later time
depending on TDX's status but that would require base KVM module to
export either kvm_x86_ops or kvm_ops_update().

Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>

Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>

The ugly part here is the type-unsafety of to_vmx/to_tdx. We probably should add some "#pragma poison" of to_vmx/to_tdx: for example both can be poisoned in pmu_intel.c after the definition of vcpu_to_lbr_records(), while one of them can be poisoned in sgx.c/posted_intr.c/vmx.c/tdx.c. Not a strict requirement though.

Paolo