On Fri, Mar 01, 2024, Kai Huang wrote:
On 1/03/2024 12:06 pm, Sean Christopherson wrote:
E.g. in this case, KVM will just skip various fast paths because of the RSVD flag,
and treat the fault like a PRIVATE fault. Hmm, but page_fault_handle_page_track()
would skip write tracking, which could theoretically cause data corruption, so I
guess arguably it would be safer to bail?
Anyone else have an opinion? This type of bug should never escape development,
so I'm a-ok effectively killing the VM. Unless someone has a good argument for
continuing on, I'll go with Kai's suggestion and squash this:
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cedacb1b89c5..d796a162b2da 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5892,8 +5892,10 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
error_code |= PFERR_PRIVATE_ACCESS;
r = RET_PF_INVALID;
- if (unlikely((error_code & PFERR_RSVD_MASK) &&
- !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) {
+ if (unlikely(error_code & PFERR_RSVD_MASK)) {
+ if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
+ return -EFAULT;
-EFAULT is part of guest_memfd() memory fault ABI. I didn't think over this
thoroughly but do you want to return -EFAULT here?
Yes, I/we do. There are many existing paths that can return -EFAULT from KVM_RUN
without setting run->exit_reason to KVM_EXIT_MEMORY_FAULT. Userspace is responsible
for checking run->exit_reason on -EFAULT (and -EHWPOISON), i.e. must be prepared
to handle a "bare" -EFAULT, where for all intents and purposes "handle" means
"terminate the guest".
That's actually one of the reasons why KVM_EXIT_MEMORY_FAULT exists, it'd require
an absurd amount of work and churn in KVM to *safely* return useful information
on *all* -EFAULTs. FWIW, I had hopes and dreams of actually doing exactly this,
but have long since abandoned those dreams.
In other words, KVM_EXIT_MEMORY_FAULT essentially communicates to userspace that
(a) userspace can likely fix whatever badness triggered the -EFAULT, and (b) that
KVM is in a state where fixing the underlying problem and resuming the guest is
safe, e.g. won't corrupt the guest (because KVM is in a half-baked state).