Re: [PATCH 17/28] KVM: nVMX: pass PFERR_USER_MASK to MMU on EPT violations

From: mlevitsk

Date: Tue Jun 02 2026 - 10:36:02 EST

On Tue, 2026-05-05 at 21:52 +0200, Paolo Bonzini wrote:
> For EPT, PFERR_USER_MASK refers not to the CPL of the guest,
> but to the AND of the U bits encountered while walking guest
> page tables; this is consistent with how MBEC differentiates
> between XS and XU. This is available through the
> "advanced vmexit information for EPT violations" feature.
>
> Tested-by: David Riley <d.riley@xxxxxxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> arch/x86/kvm/vmx/common.h | 12 +++++++++---
> arch/x86/kvm/vmx/vmx.c    | 10 ++++++++++
> 2 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
> index 40fa72f31fc7..08005676702c 100644
> --- a/arch/x86/kvm/vmx/common.h
> +++ b/arch/x86/kvm/vmx/common.h
> @@ -100,9 +100,15 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
> error_code |= (exit_qualification & EPT_VIOLATION_PROT_USER_EXEC)
>       ? PFERR_PRESENT_MASK : 0;
>
> - if (exit_qualification & EPT_VIOLATION_GVA_IS_VALID)
> - error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ?
> -       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
> + if (exit_qualification & EPT_VIOLATION_GVA_IS_VALID) {
> + if (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) {
> + error_code |= PFERR_GUEST_FINAL_MASK;
> + if (exit_qualification & EPT_VIOLATION_GVA_USER)
> + error_code |= PFERR_USER_MASK;
> + } else {
> + error_code |= PFERR_GUEST_PAGE_MASK;
> + }
> + }

Minor nitpick:
Technically this code should check for VMX_EPT_ADVANCED_VMEXIT_INFO_BIT.
Otherwise we might pass (in theory) the PFERR_USER_MASK when it's not there.

Yes, in practice, undefined bits are zero, and on top of that, as long as MBEC is not supported, MMU core
should just ignore the PFERR_USER_MASK, but still even if this is for documentation purposes,
it might be worth it to check it here.
What do you think?

>
> if (vt_is_tdx_private_gpa(vcpu->kvm, gpa))
> error_code |= PFERR_PRIVATE_ACCESS;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f1d616f928a1..9d5cd358ccc5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2790,6 +2790,16 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> vmx_cap->vpid = 0;
> }
>
> + /*
> + * Virtualizing MBEC requires advanced vmexit information in order to
> + * distinguish supervisor and user accesses. For simplicity and clarity
> + * disable MBEC entirely if advanced vmexit information is not available,

This makes sense, however it feels to me a bit out of place in this patch,
it seems better to belong to one of the former patches.

When thinking about this, and last two patches, I started to think that maybe
it is worth merging:

'KVM: VMX: enable use of MBEC',
'KVM: nVMX: pass advanced EPT violation vmexit info to guest'
'KVM: nVMX: pass PFERR_USER_MASK to MMU on EPT violations'

into one patch 'KVM: VMX: enable use of MBEC', except the code that passes advanced
EPT violation to the nested guest (the 4nd hunk of the second patch).

And then turn this hunk to a separate patch which can still be named
as the second patch.

This way it will be IMHO clearer when we honour the 'enable_mbec', and in theory
there will not be a two patch window in which mbec could be enabled with unsupported
configuration.

Finally (assuming that what I am thinking is correct, I haven't verified it),
assuming that 'EPT advanced qualification' was specially added for MBEC,
we can add a comment stating that it is unlikely that there are CPUs
(outside of some weird nested configurations), which support either but not both features.

> + * this way mbec=1 in the kvm_intel module parameters implies availability
> + * to nested guests as well.

Best regards,
Maxim Levitsky

> + */
> + if (!(vmx_cap->ept & VMX_EPT_ADVANCED_VMEXIT_INFO_BIT))
> + _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC;
> +
> if (!cpu_has_sgx())
> _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_ENCLS_EXITING;
>