Re: [PATCH v2 1/4] perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU has isolation
From: Jim Mattson
Date: Thu Apr 23 2026 - 14:01:44 EST
On Thu, Apr 23, 2026 at 8:03 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> When filling the list of MSRs to be loaded by KVM on VM-Enter and VM-Exit,
> *never* insert an entry for PEBS_ENABLED if the CPU properly isolates PEBS
> events, in which case disabling counters via PERF_GLOBAL_CTRL is sufficient
> to prevent unwanted PEBS events in the guest (or host). Because perf loads
> PEBS_ENABLE with the unfiltered cpu_hw_events.pebs_enabled, i.e. with both
> host and guest masks, there is no need to load different values for the
> guest versus host, perf+KVM can and should simply control which counters
> are enabled/disabled via PERF_GLOBAL_CTRL.
>
> Avoiding touching PEBS_ENABLED fixes a theorized bug where PEBS_ENABLED can
> end up with "stuck" bits if a PEBS event is throttled better generating the
> list and actually entering the guest (Intel CPUs can't arbtitrarily block
> NMIs). And stating the obvious, leaving PEBS_ENABLED as-is avoids three MSR
> writes on every VMX transition: one each on entry/exit, and one more
> explicit WRMSR to zero PEBS_ENABLED before VM-Entry (KVM assumes the only
> reason PEBS_ENABLED is in the load list is if the CPU lacks isolation and
> thus needs a quiescent period).
>
> Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
> Cc: Jim Mattson <jmattson@xxxxxxxxxx>
> Cc: Mingwei Zhang <mizhang@xxxxxxxxxx>
> Cc: Stephane Eranian <eranian@xxxxxxxxxx>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
> arch/x86/events/intel/core.c | 42 ++++++++++++++++++++----------------
> 1 file changed, 23 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 793335c3ce78..002d809f82ef 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4999,12 +4999,15 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> struct kvm_pmu *kvm_pmu = (struct kvm_pmu *)data;
> u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
> u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
> - int global_ctrl, pebs_enable;
> + u64 guest_pebs_mask = pebs_mask & ~cpuc->intel_ctrl_host_mask;
> + int global_ctrl;
Is it worth noting somewhere that pebs_ept is not supported on any
CPUs with PMU version < 5, where a single event can set two
PEBS_ENABLE bits (cf. intel_pmu_pebs_enable)?
> /*
> * In addition to obeying exclude_guest/exclude_host, remove bits being
> * used for PEBS when running a guest, because PEBS writes to virtual
> - * addresses (not physical addresses).
> + * addresses (not physical addresses). If the guest wants to utilize
> + * PEBS, and PEBS can safely enabled in the guest, bits for the guest's
> + * PEBS-enabled counters will be OR'd back in as appropriate.
> */
> *nr = 0;
> global_ctrl = (*nr)++;
> @@ -5051,24 +5054,25 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> };
> }
>
> - pebs_enable = (*nr)++;
> - arr[pebs_enable] = (struct perf_guest_switch_msr){
> - .msr = MSR_IA32_PEBS_ENABLE,
> - .host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,
> - .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask & kvm_pmu->pebs_enable,
> - };
> -
> - if (arr[pebs_enable].host) {
> - /* Disable guest PEBS if host PEBS is enabled. */
> - arr[pebs_enable].guest = 0;
> - } else {
> - /* Disable guest PEBS thoroughly for cross-mapped PEBS counters. */
> - arr[pebs_enable].guest &= ~kvm_pmu->host_cross_mapped_mask;
> - arr[global_ctrl].guest &= ~kvm_pmu->host_cross_mapped_mask;
> - /* Set hw GLOBAL_CTRL bits for PEBS counter when it runs for guest */
> - arr[global_ctrl].guest |= arr[pebs_enable].guest;
> - }
> + /*
> + * Disable counters where the guest PMC is different than the host PMC
> + * being used on behalf of the guest, as the PEBS record includes
> + * PERF_GLOBAL_STATUS, i.e. the guest will see overflow status for the
> + * wrong counter(s). Similarly, disallow PEBS in the guest if the host
> + * is using PEBS, to avoid bleeding host state into PEBS records.
> + */
> + guest_pebs_mask &= kvm_pmu->pebs_enable & ~kvm_pmu->host_cross_mapped_mask;
> + if (pebs_mask & ~cpuc->intel_ctrl_guest_mask)
> + guest_pebs_mask = 0;
I don't understand this clause. IIUC, it says that if we don't have
any exclude-host PEBS events, then clear PEBS_ENABLE for the guest.
Yes, any guest-programmed PEBS event should be exclude-host, but if
there is an inconsistency, shouldn't we apply a mask? What if there is
only one exclude-host PEBS event, but there are two bits set in
guest_pebs_mask?
> + /*
> + * Do NOT mess with PEBS_ENABLED. As above, disabling counters via
> + * PERF_GLOBAL_CTRL is sufficient, and loading a stale PEBS_ENABLED,
> + * e.g. on VM-Exit, can put the system in a bad state. Simply enable
> + * counters in PERF_GLOBAL_CTRL, as perf load PEBS_ENABLED with the
> + * full value, i.e. perf *also* relies on PERF_GLOBAL_CTRL.
> + */
> + arr[global_ctrl].guest |= guest_pebs_mask;
> return arr;
> }
>
> --
> 2.54.0.545.g6539524ca2-goog
>