On Wed, Jul 07, 2021, Brijesh Singh wrote:
Follow the recommendation from APM2 section 15.36.10 and 15.36.11 to
resolve the RMP violation encountered during the NPT table walk.
Heh, please elaborate on exactly what that recommendation is. A recommendation
isn't exactly architectural, i.e. is subject to change :-)
And, do we have to follow the APM's recommendation?
#NPF RMP violations as guest errors, or is that not allowed by the GHCB spec?
I.e. can we mandate accesses be preceded by page state change requests?
simplify KVM (albeit not much of a simplificiation) and would also make debugging
easier since transitions would require an explicit guest request and guest bugs
would result in errors instead of random corruption/weirdness.
index 46323af09995..117e2e08d7ed 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1399,6 +1399,9 @@ struct kvm_x86_ops {
void (*write_page_begin)(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn);
void (*write_page_end)(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn);
+
+ int (*handle_rmp_page_fault)(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn,
+ int level, u64 error_code);
};
struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e60f54455cdc..b6a676ba1862 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5096,6 +5096,18 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
write_unlock(&vcpu->kvm->mmu_lock);
}
+static int handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+ kvm_pfn_t pfn;
+ int level;
+
+ if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &level)))
+ return RET_PF_RETRY;
+
+ kvm_x86_ops.handle_rmp_page_fault(vcpu, gpa, pfn, level, error_code);
+ return RET_PF_RETRY;
+}
+
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
void *insn, int insn_len)
{
@@ -5112,6 +5124,14 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
goto emulate;
}
+ if (unlikely(error_code & PFERR_GUEST_RMP_MASK)) {
+ r = handle_rmp_page_fault(vcpu, cr2_or_gpa, error_code);
Adding a kvm_x86_ops hook is silly, there's literally one path, npf_interception()
that can encounter RMP violations. Just invoke snp_handle_rmp_page_fault() from
there. That works even if kvm_mmu_get_tdp_walk() stays around since it was
exported earlier.
+
+ /*
+ * If it's a shared access, then make the page shared in the RMP table.
+ */
+ if (rmpentry_assigned(e) && !private)
+ rc = snp_make_page_shared(vcpu, gpa, pfn, PG_LEVEL_4K);
Hrm, this really feels like it needs to be protected by mmu_lock. Functionally,
it might all work out in the end after enough RMP violations, but it's extremely
difficult to reason about and probably even more difficult if multiple vCPUs end
up fighting over a gfn.
My gut reaction is that this is also backwards, i.e. KVM should update the RMP
to match its TDP SPTEs, not the other way around.
The one big complication is that the TDP MMU only takes mmu_lock for read. A few
options come to mind but none of them are all that pretty. I'll wait to hear back
on whether or not we can make PSC request mandatory before thinking too hard on
this one.