Re: kvm guests crash when running "perf kvm top"

From: Jim Mattson

Date: Tue Mar 17 2026 - 12:02:25 EST


On Wed, Apr 9, 2025 at 10:05 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:

> Long story short, masking PEBS_ENABLE with the guest's value (in addition to
> what perf allows) fixes the issue on my end. Assuming testing goes well, I'll
> post this as a proper patch.
>
> --
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index cdb19e3ba3aa..1d01fb43a337 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4336,7 +4336,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> arr[pebs_enable] = (struct perf_guest_switch_msr){
> .msr = MSR_IA32_PEBS_ENABLE,
> .host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,
> - .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask,
> + .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask & kvm_pmu->pebs_enable,
> };
>
> if (arr[pebs_enable].host) {

Because kvm_pmu->pebs_enable is optimistic, I think we also need to
swap these two modifications to guest PGC in the code below:

arr[global_ctrl].guest &= ~kvm_pmu->host_cross_mapped_mask;
/* Set hw GLOBAL_CTRL bits for PEBS counter when it runs for guest */
arr[global_ctrl].guest |= arr[pebs_enable].guest;


By the way, IA32_PEBS_ENABLE can be modified in NMI context by
handle_pmi_common(), so the host value returned from this function may
already be stale. We have seen cases where handle_pmi_common() clears
a bit in IA32_PEBS_ENABLE between here and the next VM-entry. VM-exit
restores the stale value with the bit set, and that bit persists
indefinitely. If the next perf event assigned to that PMC is not a
PEBS event, it magically becomes one. When an NMI arrives for PEBS
buffer overflow, perf refuses to claim it, because it doesn't think
any PEBS events are active. So, we get an "Uhhuh. NMI received for
unknown reason" message on the console. A flood of these is enough to
trigger the NMI watchdog and cause a panic.

I think we need a fixup after VM-exit to clear any IA32_PEBS_ENABLE
bits that were cleared by handle_pmi_common() between
intel_guest_get_msrs() and VM-entry, but I'm not sure what the best
API might be. Calling intel_guest_get_msrs() again seems too
heavyweight. Maybe KVM could ask perf nicely to just rewrite the MSR
with cpuc->pebs_enabled? Note that any erroneous IA32_PEBS_ENABLE bits
are dormant post VM-exit, since any PMCs throttled by
handle_pmi_common() will have their enable bits cleared in the
corresponding event selector.