Re: [PATCH 08/16] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults

From: Huang, Kai
Date: Thu Feb 29 2024 - 18:23:00 EST




On 1/03/2024 12:06 pm, Sean Christopherson wrote:
On Fri, Mar 01, 2024, Kai Huang wrote:


On 28/02/2024 3:41 pm, Sean Christopherson wrote:
WARN and skip the emulated MMIO fastpath if a private, reserved page fault
is encountered, as private+reserved should be an impossible combination
(KVM should never create an MMIO SPTE for a private access).

Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/kvm/mmu/mmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bd342ebd0809..9206cfa58feb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5866,7 +5866,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
error_code |= PFERR_PRIVATE_ACCESS;
r = RET_PF_INVALID;
- if (unlikely(error_code & PFERR_RSVD_MASK)) {
+ if (unlikely((error_code & PFERR_RSVD_MASK) &&
+ !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) {
r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
if (r == RET_PF_EMULATE)
goto emulate;

It seems this will make KVM continue to call kvm_mmu_do_page_fault() when
such private+reserve error code actually happens (e.g., due to bug), because
@r is still RET_PF_INVALID in such case.

Yep.

Is it better to just return error, e.g., -EINVAL, and give up?

As long as there is no obvious/immediate danger to the host, no obvious way for
the "bad" behavior to cause data corruption for the guest, and continuing on has
a plausible chance of working, then KVM should generally try to continue on and
not terminate the VM.

Agreed. But I think sometimes it is hard to tell whether there's any dangerous things waiting to happen, because that means we have to sanity check a lot of code, and when new patches arrive we need to keep that in mind too, which could be a nightmare in terms of maintenance.


E.g. in this case, KVM will just skip various fast paths because of the RSVD flag,
and treat the fault like a PRIVATE fault. Hmm, but page_fault_handle_page_track()
would skip write tracking, which could theoretically cause data corruption, so I
guess arguably it would be safer to bail?

Anyone else have an opinion? This type of bug should never escape development,
so I'm a-ok effectively killing the VM. Unless someone has a good argument for
continuing on, I'll go with Kai's suggestion and squash this:

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cedacb1b89c5..d796a162b2da 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5892,8 +5892,10 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
error_code |= PFERR_PRIVATE_ACCESS;
r = RET_PF_INVALID;
- if (unlikely((error_code & PFERR_RSVD_MASK) &&
- !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) {
+ if (unlikely(error_code & PFERR_RSVD_MASK)) {
+ if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
+ return -EFAULT;

-EFAULT is part of guest_memfd() memory fault ABI. I didn't think over this thoroughly but do you want to return -EFAULT here?