Re: [PATCH v4 3/5] KVM: SVM: Fix nested NPF injection of PFERR_GUEST_{PAGE,FINAL}_MASK bits

From: Yosry Ahmed

Date: Tue May 26 2026 - 14:38:15 EST


On Fri, May 22, 2026 at 4:27 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> From: Kevin Cheng <chengkev@xxxxxxxxxx>
>
> Fix KVM's generation of PFERR_GUEST_{PAGE,FINAL}_MASK bits when injecting a
> Nested Page Fault into L1. Currently, KVM blindly stuffs GUEST_FINAL into
> L1, which is blatantly wrong given that KVM obviously generates NPFs for
> page table accesses.
>
> There are two paths that trigger NPF injection: hardware NPF exits (from
> L2) and emulation-triggered faults, i.e. when KVM detects a NPF as part of
> emulating an L2 GVA access. For the hardware case, use the bits verbatim
> from the VMCB, as KVM is simply forwarding a NPF to L1. For the emulation
> case, propagate the GUEST_{PAGE,FINAL} bits from the access field (which
> were recently added for MBEC+GMET support).
>
> To differentiate between the two cases, add "hardware_nested_page_fault"
> to "struct x86_exception", and set it when injecting a NPF in response to
> an NPF exit from L2.

hardware_nested_page_fault is no more.

>
> To help guard against future goofs, assert that exactly one of GUEST_PAGE
> or GUEST_FINAL is set when injecting a NPF. Unlike VMX, there are no
> (known) cases where hardware doesn't set either bit, and KVM should always
> set one or the other when emulating a GVA access.
>
> Signed-off-by: Kevin Cheng <chengkev@xxxxxxxxxx>
> [sean: use plumbed in @access bits, massage changelog]
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
[..]
> @@ -39,19 +39,32 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> struct vmcb *vmcb = svm->vmcb;
> + u64 fault_stage;
>
> - if (vmcb->control.exit_code != SVM_EXIT_NPF) {
> - /*
> - * TODO: track the cause of the nested page fault, and
> - * correctly fill in the high bits of exit_info_1.
> - */
> - vmcb->control.exit_code = SVM_EXIT_NPF;
> - vmcb->control.exit_info_1 = (1ULL << 32);
> - vmcb->control.exit_info_2 = fault->address;
> - }
> + /*
> + * For hardware NPF exits, the GUEST_FAULT_STAGE bits are only
> + * available in the hardware exit_info_1, since the guest_mmu
> + * walker doesn't know whether the faulting GPA was a page table
> + * page or final page from L2's perspective.
> + */
> + if (from_hardware)
> + fault_stage = vmcb->control.exit_info_1 &
> + PFERR_GUEST_FAULT_STAGE_MASK;
> + else
> + fault_stage = fault->error_code & PFERR_GUEST_FAULT_STAGE_MASK;
>
> - vmcb->control.exit_info_1 &= ~0xffffffffULL;
> - vmcb->control.exit_info_1 |= fault->error_code;
> + /*
> + * All nested page faults should be annotated as occurring on the
> + * final translation *or* the page walk. Arbitrarily choose "final"
> + * if KVM is buggy and enumerated both or neither.
> + */
> + if (WARN_ON_ONCE(hweight64(fault_stage) != 1))
> + fault_stage = PFERR_GUEST_FINAL_MASK;
> +
> + vmcb->control.exit_code = SVM_EXIT_NPF;
> + vmcb->control.exit_info_1 = fault_stage |
> + (fault->error_code & ~PFERR_GUEST_FAULT_STAGE_MASK);

Do we need to do this in the common path? If from_hardware=true, can
the fault injected by KVM have different flags from the one produced
by hardware? I guess the answer is yes, (e.g. if KVM is doing
write-protection?). Might be worth a comment.

> + vmcb->control.exit_info_2 = fault->address;
>
> nested_svm_vmexit(svm);
> }
> --
> 2.54.0.794.g4f17f83d09-goog
>
>