Re: [PATCH Part2 v6 41/49] KVM: SVM: Add support to handle the RMP nested page fault

From: Kalra, Ashish
Date: Mon Oct 10 2022 - 22:32:23 EST


Hello Alper,

On 10/10/2022 5:03 PM, Alper Gun wrote:
On Mon, Jun 20, 2022 at 4:13 PM Ashish Kalra <Ashish.Kalra@xxxxxxx> wrote:

From: Brijesh Singh <brijesh.singh@xxxxxxx>

When SEV-SNP is enabled in the guest, the hardware places restrictions on
all memory accesses based on the contents of the RMP table. When hardware
encounters RMP check failure caused by the guest memory access it raises
the #NPF. The error code contains additional information on the access
type. See the APM volume 2 for additional information.

Signed-off-by: Brijesh Singh <brijesh.singh@xxxxxxx>
---
arch/x86/kvm/svm/sev.c | 76 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 14 +++++---
2 files changed, 86 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4ed90331bca0..7fc0fad87054 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4009,3 +4009,79 @@ void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)

spin_unlock(&sev->psc_lock);
}
+
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+ int rmp_level, npt_level, rc, assigned;
+ struct kvm *kvm = vcpu->kvm;
+ gfn_t gfn = gpa_to_gfn(gpa);
+ bool need_psc = false;
+ enum psc_op psc_op;
+ kvm_pfn_t pfn;
+ bool private;
+
+ write_lock(&kvm->mmu_lock);
+
+ if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level)))
+ goto unlock;
+
+ assigned = snp_lookup_rmpentry(pfn, &rmp_level);
+ if (unlikely(assigned < 0))
+ goto unlock;
+
+ private = !!(error_code & PFERR_GUEST_ENC_MASK);
+
+ /*
+ * If the fault was due to size mismatch, or NPT and RMP page level's
+ * are not in sync, then use PSMASH to split the RMP entry into 4K.
+ */
+ if ((error_code & PFERR_GUEST_SIZEM_MASK) ||
+ (npt_level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M && private)) {
+ rc = snp_rmptable_psmash(kvm, pfn);


Regarding this case:
RMP level is 4K
Page table level is 2M

Does this also cause a page fault with size mismatch? If so, we
shouldn't try psmash because the rmp entry is already 4K.

I see these errors in our tests and I think it may be happening
because rmp size is already 4K.

[ 1848.752952] psmash failed, gpa 0x191560000 pfn 0x536cd60 rc 7
[ 2922.879635] psmash failed, gpa 0x102830000 pfn 0x37c8230 rc 7
[ 3010.983090] psmash failed, gpa 0x104220000 pfn 0x6cf1e20 rc 7
[ 3170.792050] psmash failed, gpa 0x108a80000 pfn 0x20e0080 rc 7
[ 3345.955147] psmash failed, gpa 0x11b480000 pfn 0x1545e480 rc 7

Shouldn't we use AND instead of OR in the if statement?


I believe this we can't do, looking at the typical usage case below :

[ 37.243969] #VMEXIT (NPF) - SIZEM, err 0xc80000005 npt_level 2, rmp_level 2, private 1
[ 37.243973] trying psmash gpa 0x7f790000 pfn 0x1f5d90

This is typically the case with #VMEXIT(NPF) with SIZEM error code, when the guest tries to do PVALIDATE on 4K GHCB pages, in this case both the
RMP table and NPT will be optimally setup to 2M hugepage as can be seen.

Is it possible to investigate in more depth, when is the this case being observed:
RMP level is 4K
Page table level is 2M
We shouldn't try psmash because the rmp entry is already 4K.

Thanks,
Ashish

if ((error_code & PFERR_GUEST_SIZEM_MASK) && ...