Re: [PATCH v2 22/24] KVM: TDX: Finalize VM initialization

From: Paolo Bonzini
Date: Tue Dec 24 2024 - 09:31:50 EST


On 11/12/24 08:38, Yan Zhao wrote:
From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>

Introduce a new VM-scoped KVM_MEMORY_ENCRYPT_OP IOCTL subcommand,
KVM_TDX_FINALIZE_VM, to perform TD Measurement Finalization.

The API documentation is provided in a separate patch:
“Documentation/virt/kvm: Document on Trust Domain Extensions (TDX)”.

Enhance TDX’s set_external_spte() hook to record the pre-mapping count
instead of returning without action when the TD is not finalized.

Adjust the pre-mapping count when pages are added or if the mapping is
dropped.

Set pre_fault_allowed to true after the finalization is complete.

Note: TD Measurement Finalization is the process by which the initial state
of the TDX VM is measured for attestation purposes. It uses the SEAMCALL
TDH.MR.FINALIZE, after which:
1. The VMM can no longer add TD private pages with arbitrary content.
2. The TDX VM becomes runnable.

Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
Co-developed-by: Adrian Hunter <adrian.hunter@xxxxxxxxx>
Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx>
---
TDX MMU part 2 v2
- Merge changes from patch "KVM: TDX: Premap initial guest memory" into
this patch (Paolo)
- Consolidate nr_premapped counting into this patch (Paolo)
- Page level check should be (and is) in tdx_sept_set_private_spte() in
patch "KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror
page table" not in tdx_mem_page_record_premap_cnt() (Paolo)
- Protect finalization using kvm->slots_lock (Paolo)
- Set kvm->arch.pre_fault_allowed to true after finalization is done
(Paolo)
- Add a memory barrier to ensure correct ordering of the updates to
kvm_tdx->finalized and kvm->arch.pre_fault_allowed (Adrian)
- pre_fault_allowed must not be true before finalization is done.
Highlight that fact by checking it in tdx_mem_page_record_premap_cnt()
(Adrian)
- No need for is_td_finalized() (Rick)
- Fixup SEAMCALL call sites due to function parameter changes to SEAMCALL
wrappers (Kai)
- Add nr_premapped where it's first used (Tao)

I have just a couple imprecesions to note:
- stale reference to 'finalized'
- atomic64_read WARN should block the following atomic64_dec (there is still
a small race window but it's not worth using a dec-if-not-zero operation)
- rename tdx_td_finalizemr to tdx_td_finalize

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 61e4f126addd..eb0de85c3413 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -609,8 +609,8 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
get_page(pfn_to_page(pfn));
/*
- * To match ordering of 'finalized' and 'pre_fault_allowed' in
- * tdx_td_finalizemr().
+ * Read 'pre_fault_allowed' before 'kvm_tdx->state'; see matching
+ * barrier in tdx_td_finalize().
*/
smp_rmb();
if (likely(kvm_tdx->state == TD_STATE_RUNNABLE))
@@ -1397,7 +1397,7 @@ void tdx_flush_tlb_all(struct kvm_vcpu *vcpu)
ept_sync_global();
}
-static int tdx_td_finalizemr(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
+static int tdx_td_finalize(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
{
struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
@@ -1452,7 +1452,7 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
r = tdx_td_init(kvm, &tdx_cmd);
break;
case KVM_TDX_FINALIZE_VM:
- r = tdx_td_finalizemr(kvm, &tdx_cmd);
+ r = tdx_td_finalize(kvm, &tdx_cmd);
break;
default:
r = -EINVAL;
@@ -1715,8 +1715,8 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
goto out;
}
- WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped));
- atomic64_dec(&kvm_tdx->nr_premapped);
+ if (!WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped)))
+ atomic64_dec(&kvm_tdx->nr_premapped);
if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) {
for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) {