Re: [PATCH 08/16] KVM: x86/mmu: Bug the VM if kvm_zap_gfn_range() is called for TDX

From: Huang, Kai
Date: Wed May 15 2024 - 18:18:22 EST




On 16/05/2024 4:22 am, Isaku Yamahata wrote:
On Wed, May 15, 2024 at 08:34:37AM -0700,
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d5cf5b15a10e..808805b3478d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6528,8 +6528,17 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end);
- if (tdp_mmu_enabled)
+ if (tdp_mmu_enabled) {
+ /*
+ * kvm_zap_gfn_range() is used when MTRR or PAT memory
+ * type was changed. TDX can't handle zapping the private
+ * mapping, but it's ok because KVM doesn't support either of
+ * those features for TDX. In case a new caller appears, BUG
+ * the VM if it's called for solutions with private aliases.
+ */
+ KVM_BUG_ON(kvm_gfn_shared_mask(kvm), kvm);

Please stop using kvm_gfn_shared_mask() as a proxy for "is this TDX". Using a
generic name quite obviously doesn't prevent TDX details for bleeding into common
code, and dancing around things just makes it all unnecessarily confusing.

If we can't avoid bleeding TDX details into common code, my vote is to bite the
bullet and simply check vm_type.

TDX has several aspects related to the TDP MMU.
1) Based on the faulting GPA, determine which KVM page table to walk.
(private-vs-shared)
2) Need to call TDX SEAMCALL to operate on Secure-EPT instead of direct memory
load/store. TDP MMU needs hooks for it.
3) The tables must be zapped from the leaf. not the root or the middle.

For 1) and 2), what about something like this? TDX backend code will set
kvm->arch.has_mirrored_pt = true; I think we will use kvm_gfn_shared_mask() only
for address conversion (shared<->private).

For 1), maybe we can add struct kvm_page_fault.walk_mirrored_pt
(or whatever preferable name)?

For 3), flag of memslot handles it.

---
arch/x86/include/asm/kvm_host.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aabf1648a56a..218b575d24bd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1289,6 +1289,7 @@ struct kvm_arch {
u8 vm_type;
bool has_private_mem;
bool has_protected_state;
+ bool has_mirrored_pt;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
struct list_head active_mmu_pages;
struct list_head zapped_obsolete_pages;
@@ -2171,8 +2172,10 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
#ifdef CONFIG_KVM_PRIVATE_MEM
#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
+#define kvm_arch_has_mirrored_pt(kvm) ((kvm)->arch.has_mirrored_pt)
#else
#define kvm_arch_has_private_mem(kvm) false
+#define kvm_arch_has_mirrored_pt(kvm) false
#endif
static inline u16 kvm_read_ldt(void)

I think this 'has_mirrored_pt' (or a better name) is better, because it clearly conveys it is for the "page table", but not the actual page that any page table entry maps to.

AFAICT we need to split the concept of "private page table itself" and the "memory type of the actual GFN".

E.g., both SEV-SNP and TDX has concept of "private memory" (obviously), but I was told only TDX uses a dedicated private page table which isn't directly accessible for KVV. SEV-SNP on the other hand just uses normal page table + additional HW managed table to make sure the security.

In other words, I think we should decide whether to invoke TDP MMU callback for private mapping (the page table itself may just be normal one) depending on the fault->is_private, but not whether the page table is private:

if (fault->is_private && kvm_x86_ops->set_private_spte)
kvm_x86_set_private_spte(...);
else
tdp_mmu_set_spte_atomic(...);

And the 'has_mirrored_pt' should be only used to select the root of the page table that we want to operate on.

This also gives a chance that if there's anything special needs to be done for page allocated for the "non-leaf" middle page table for SEV-SNP, it can just fit.

Of course, if 'has_mirrored_pt' is true, we can assume there's a special way to operate it, i.e., kvm_x86_ops->set_private_spte etc must be valid.