Re: [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM
From: Yosry Ahmed
Date: Thu Apr 30 2026 - 23:35:27 EST
On Thu, Apr 30, 2026 at 4:24 PM Yosry Ahmed <yosry@xxxxxxxxxx> wrote:
>
> > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> > index 0e99022168a85..0c372b9f8ed34 100644
> > --- a/arch/x86/kvm/pmu.h
> > +++ b/arch/x86/kvm/pmu.h
> > @@ -36,6 +36,7 @@ struct kvm_pmu_ops {
> > void (*reset)(struct kvm_vcpu *vcpu);
> > void (*deliver_pmi)(struct kvm_vcpu *vcpu);
> > void (*cleanup)(struct kvm_vcpu *vcpu);
> > + void (*reprogram_counters)(struct kvm_vcpu *vcpu, u64 counters);
> >
> > bool (*is_mediated_pmu_supported)(struct x86_pmu_capability *host_pmu);
> > void (*mediated_load)(struct kvm_vcpu *vcpu);
> > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> > index 7aa298eeb0721..fe6f2bb79ab83 100644
> > --- a/arch/x86/kvm/svm/pmu.c
> > +++ b/arch/x86/kvm/svm/pmu.c
> > @@ -260,6 +260,48 @@ static void amd_mediated_pmu_put(struct kvm_vcpu *vcpu)
> > wrmsrq(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, pmu->global_status);
> > }
> >
> > +static void amd_mediated_pmu_handle_host_guest_bits(struct kvm_vcpu *vcpu,
> > + struct kvm_pmc *pmc)
> > +{
> > + u64 host_guest_bits;
> > +
> > + if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
> > + return;
> > +
> > + /* Count all events if both bits are cleared */
> > + host_guest_bits = pmc->eventsel & AMD64_EVENTSEL_HOST_GUEST_MASK;
> > + if (!host_guest_bits)
> > + return;
> > +
> > + /*
> > + * If EFER.SVME is set, the counter is disabledd if only one of the bits
> > + * is set and it doesn't match the vCPU context. If EFER.SVME is
> > + * cleared, the counter is disable if any of the bits is set.
> > + */
> > + if (vcpu->arch.efer & EFER_SVME) {
> > + if (host_guest_bits == AMD64_EVENTSEL_HOST_GUEST_MASK)
> > + return;
> > +
> > + if (!!(host_guest_bits & AMD64_EVENTSEL_GUESTONLY) == is_guest_mode(vcpu))
> > + return;
> > + }
> > +
> > + pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
>
> Sashiko raised a good point here. In the following patch, we reprogram
> the counters synchronously on nested transitions to update the
> counters' enablement before counting VMRUN or WRMSR(EFER.SVME).
> However, this updates pmc->eventsel_hw while
> kvm_pmu_recalc_pmc_emulation() checks pmc->eventsel to check if the
> counter is enabled.
>
> I think either pmc_is_locally_enabled() needs to check
> pmc->eventsel_hw or we need to update pmc->eventsel here.
>
> AFAICT, pmc->eventself has the value written to the MSR, so I think we
> want to keep that as-is.
>
> On the other hand, ARCH_PERFMON_EVENTSEL_ENABLE is cleared in
> pmc->eventsel_hw in kvm_mediated_pmu_refresh_event_filter() if the
> event is not allowed, and kvm_pmu_recalc_pmc_emulation() has a comment
> about intentionally ignoring event filters.
>
> We can also separately track enablement for nested purposes, but that
> would add one more thing we need to check aside from general counter
> enablement and event filtering.
>
> None of these options are ideal, perhaps directly clearing the bit in
> pmc->eventsel would do the least damage as (pmc->eventsel &
> ARCH_PERFMON_EVENTSEL_ENABLE) is only checked by
> pmc_is_locally_enabled().
No, we can't really clear the bit in pmc->eventsel as that's what the
guest reads with RDMSR.
The more I think about this, the more I think we should drop
pmc->eventsel_hw. It serves two purposes AFAICT:
1. On AMD, we use it to clear HOSTONLY and set GUESTONLY in actual HW.
2. For event filtering, we use it to clear EVENTSEL_ENABLE.
But maybe it's easier to explicitly track the changes we need to apply
to eventsel rather than a HW version?
(1) is trivial, we can just clear HOSTONLY and set GUESTONLY before
doing the actual write as that's constant.
For (2), instead of eventsel_hw, maybe add a boolean called 'filtered'
to track if that PMC should be filtered out or not. Then, we can add
another boolean to track if the counter is disabled due to
nested/HG-bits (e.g. 'nested_disabled').
With that, pmc_is_locally_enabled() would check:
(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) && !pmc->nested_disabled
, and then before writing to HW we check pmc->filtered,
pmc->nested_disabled, as well as do the HOSTONLY/GUESTONLY changes for
AMD.
Actually, instead of a boolean, maybe add 'disabled_reasons' flags,
with possible flags like PMC_DISABLED_FILTER and PMC_DISABLED_NESTED.
This might all be unclear, I will draft some diff on top of the series
tomorrow and send it in case it makes things clearer.