On Thu, May 09, 2024, Michael Roth wrote:
The intended logic when handling #NPFs with the RMP bit set (31) is to
first check to see if the #NPF requires a shared<->private transition
and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT
get forwarded on to userspace before proceeding with any handling of
other potential RMP fault conditions like needing to PSMASH the RMP
entry/etc (which will be done later if the guest still re-faults after
the KVM_EXIT_MEMORY_FAULT is processed by userspace).
The determination of whether any userspace handling of
KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code
of kvm_mmu_page_fault(). However, the current code misinterprets the
return code, expecting 0 to indicate a userspace exit rather than less
than 0 (-EFAULT). This leads to the following unexpected behavior:
- for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private
conversions, warnings get printed from sev_handle_rmp_fault()
because it does not expect to be called for GPAs where
KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't
generally do this, but it is allowed and should be handled
similarly to private->shared conversions rather than triggering any
sort of warnings
- if gmem support for 2MB folios is enabled (via currently out-of-tree
code), implicit shared<->private conversions will always result in
a PSMASH being attempted, even if it's not actually needed to
resolve the RMP fault. This doesn't cause any harm, but results in a
needless PSMASH and zapping of the sPTE
Resolve these issues by calling sev_handle_rmp_fault() only when
kvm_mmu_page_fault()'s return code is greater than or equal to 0,
indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here,
simplify the code slightly and fix up the associated comments for better
clarity.
Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults")
Signed-off-by: Michael Roth <michael.roth@xxxxxxx>
---
arch/x86/kvm/svm/svm.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 426ad49325d7..9431ce74c7d4 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu)
svm->vmcb->control.insn_len);
/*
- * rc == 0 indicates a userspace exit is needed to handle page
- * transitions, so do that first before updating the RMP table.
+ * rc < 0 indicates a userspace exit may be needed to handle page
+ * attribute updates, so deal with that first before handling other
+ * potential RMP fault conditions.
*/
- if (error_code & PFERR_GUEST_RMP_MASK) {
- if (rc == 0)
- return rc;
+ if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK)
This isn't correct either. A return of '0' also indiciates "exit to userspace",
it just doesn't happen with SNP because '0' is returned only when KVM attempts
emulation, and that too gets short-circuited by svm_check_emulate_instruction().
And I would honestly drop the comment, KVM's less-than-pleasant 1/0/-errno return
values overload is ubiquitous enough that it should be relatively self-explanatory.
Or if you prefer to keep a comment, drop the part that specifically calls out
attributes updates, because that incorrectly implies that's the _only_ reason
why KVM checks the return. But my vote is to drop the comment, because it
essentially becomes "don't proceed to step 2 if step 1 failed", which kind of
makes the reader go "well, yeah".