[PATCH 0/4] perf/x86: Don't write PEBS_ENABLED on KVM transitions

From: Sean Christopherson

Date: Tue Apr 14 2026 - 15:15:07 EST


Rework the handling of PEBS_ENABLED (and related PEBS MSRs) to *never* touch
PEBS_ENABLED if the CPU provides PEBS isolation, in which case disabling
counters via PERF_GLOBAL_CTRL is sufficient to prevent generation of unwanted
PEBS records. For vCPUs without PEBS enabled, this saves upwards of 6 MSR
writes on each roundtrip between the guest and host. For vCPUS with PEBS,
this saves 2 MSR writes per roundtrip.

However, performance isn't the underlying motiviation. We (more accurately,
Jim, Mingwei, and Stephane) have been chasing issues where PEBS_ENABLED bits
can get "stuck" in a '1' state when running KVM guests while profiling the host
with PEBS events. The working theory is that perf throttles PEBS events in
NMI context, and thus clears bits in cpuc->pebs_enabled and PEBS_ENABLED, after
generating the list of PMU MSRs to context switch but before VM-Entry. And so
when the host's PEBS_ENABLED is loaded on VM-Exit, the CPU ends up with a
stale PEBS_ENABLED that doesn't get reset until something triggers an explicit
reload in perf.

Testing this against our "PEBS_ENABLED is stuck" reproducer is a work
in-progress (largely because the "reproducer" is currently "throw the kernel in
a big test pool"), i.e. I don't know if this actually resolves the problems we
are seeing. But even if it doesn't fully resolve our woes, it seems like a
no-brainer improvement, and if we're missing something with respect to "stuck"
PEBS_ENABLED, it'd be nice to get feedback/input asap.

Note, if the throttling theory is correct, then there are likely more fixes
that need to be done, e.g. for CPUs without isolation, and/or if
PERF_GLOBAL_CTRL can be modified from NMI context too.

Patch 4 is a clean up that I posted as a standalone patch almost a year ago.
I included it here because it's very related, and because I needed to refresh
it anyways.

Sean Christopherson (4):
perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU
has isolation
perf/x86/intel: Don't context switch DS_AREA (and PEBS config) if PEBS
is unused
perf/x86/intel: Make @data a mandatory param for
intel_guest_get_msrs()
perf/x86: KVM: Have perf define a dedicated struct for getting guest
PEBS data

arch/x86/events/core.c | 5 ++-
arch/x86/events/intel/core.c | 71 +++++++++++++++++++------------
arch/x86/events/perf_event.h | 3 +-
arch/x86/include/asm/kvm_host.h | 9 ----
arch/x86/include/asm/perf_event.h | 12 +++++-
arch/x86/kvm/vmx/pmu_intel.c | 20 +++++++--
arch/x86/kvm/vmx/vmx.c | 11 +++--
arch/x86/kvm/vmx/vmx.h | 2 +-
8 files changed, 84 insertions(+), 49 deletions(-)


base-commit: 6b802031877a995456c528095c41d1948546bf45
--
2.54.0.rc0.605.g598a273b03-goog