Re: [PATCH v5 07/13] KVM: x86/pmu: Disable counters based on Host-Only/Guest-Only bits in SVM

From: Sean Christopherson

Date: Tue May 05 2026 - 15:58:48 EST

On Tue, May 05, 2026, Yosry Ahmed wrote:
> > > Did you see my other replies and code snippet tracking disabling
> > > reasons? I think the code snippet would still work, I just need to
> > > move the pmc_is_nested_disabled() check into pmc_is_locally_enabled().
> >
> > I did. IMO, all of what you proposed is an optimization to avoid the "costly"
> > checks at the time of pmc_is_locally_enabled(). In quotes because I don't think
> > the _overall_ cost is actually all that high. pmc_is_locally_enabled() is only
> > called in relatively slow paths, and my guess is the CALL+RET (or untrained RET,
> > ugh) is probably more expensive than the logic itself.
> >
> > The very nice side effect of incorporating the logic into pmc_is_locally_enabled()
> > is that I _think_ we can drop kvm_pmu_ops.reprogram_counters(), because
> > amd_mediated_pmu_handle_host_guest_bits() will instead be:
> >
> > static bool amd_pmc_is_locally_disabled(struct kvm_pmc *pmc)
> > {
> > struct kvm_pmu *pmu = pmc_to_pmu(pmc);
> > struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu);
> > u64 host_guest_bits;
> >
> > /* Common code is supposed to check the common enable bit. */
> > if (WARN_ON_ONCE(!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE)))
> > return false;
> >
> > /*
> > * If both bits are cleared, always keep the counter enabled. Otherwise,
> > * counter enablement needs to be re-evaluated on every nested
> > * transition (and EFER.SVME change).
> > */
> > host_guest_bits = pmc->eventsel & AMD64_EVENTSEL_HOST_GUEST_MASK;
> > if (!host_guest_bits)
> > return true;
> >
> > /* If either bit is set and EFER.SVME=0, the counter is disabled. */
> > if (!(vcpu->arch.efer & EFER_SVME))
> > return false;
> >
> > if (host_guest_bits == AMD64_EVENTSEL_HOST_GUEST_MASK)
> > return true;
> >
> > return !!(host_guest_bits & AMD64_EVENTSEL_GUESTONLY) == is_guest_mode(vcpu);
> > }
>
> If we do this and drop kvm_pmu_ops.reprogram_counters(), we still need
> somewhere to actually clear ARCH_PERFMON_EVENTSEL_ENABLE in eventsel_hw.

Yep. I was thinking we'd do that as part of reprogram_counter().

> What if we call kvm_pmu_ops.pmc_is_locally_disabled() at the top of
> reprogram_counter(), cache the result, and use that for eventsel_hw
> modification and in pmc_is_locally_enabled()?

Why bother caching it? I suspect it will make the code harder to read, I would
generally prefer pmc_is_locally_enabled() be self-sufficient, and I don't think
the caching will elide any lookups. Legacy PMU doesn't care, and even if it did,
it should only query pmc_is_locally_enabled() once. And the mediated PMU should
only query it once.

> We'd also probably want to rename it. I would honeslty just use 'nested'
> instead of 'locally_disabled' and 'mode_specific_enables' as that's the
> only current user.

But it's not strictly nested specific. E.g. even the vCPU doesn't support nested
virtualization, a (stupid) guest can still set HOST_ONLY and effectively disable
the counter, thanks to the bizarro behavior of HOST_ONLY when EFER.SVME=0.

> Something like this with your proposed amd_pmc_is_locally_disabled()
> above, which is similar to the kvm_pmu_ops.mediated_reprogram_counter()
> implementation in v4 except that the vendor-specific callback is more
> targeted:
>
> static void pmc_check_nested_disabled(struct kvm_pmc *pmc)
> {
> if (!(pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE))
> return;
>
> if (!test_bit(pmc->idx, &pmu->pmc_has_nested_enables))
> return;
>
> pmc->is_nested_disabled = kvm_pmu_call(pmc_is_nested_disabled)(pmc);
> if (!pmc->is_nested_disabled)
> pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;

I don't want multiple paths mucking with eventsel_hw's ARCH_PERFMON_EVENTSEL_ENABLE.
That's how we end up with ordering bugs. E.g. pmc_check_nested_disabled() *must*
be called after kvm_mediated_pmu_refresh_event_filter(), which is gross and brittle.

E.g. something like so. We can even optimize away the PMU filter lookup (which
I suspect would be more expensive in the common case?) if the counter is disabled
thanks to H/G bits.

diff --git arch/x86/kvm/pmu.c arch/x86/kvm/pmu.c
index 7b2b4ce6bdad..4ca4a0078d35 100644
--- arch/x86/kvm/pmu.c
+++ arch/x86/kvm/pmu.c
@@ -530,21 +530,24 @@ static bool pmc_is_event_allowed(struct kvm_pmc *pmc)
return is_fixed_event_allowed(filter, pmc->idx);
}

-static void kvm_mediated_pmu_refresh_event_filter(struct kvm_pmc *pmc)
+static void kvm_mediated_pmu_refresh_eventsel_hw(struct kvm_pmc *pmc)
{
- bool allowed = pmc_is_event_allowed(pmc);
struct kvm_pmu *pmu = pmc_to_pmu(pmc);

if (pmc_is_gp(pmc)) {
pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
- if (allowed)
+ if (!test_bit(pmc->idx, &pmu->pmc_has_mode_specific_enables) &&
+ kvm_pmu_call(pmc_is_locally_disabled(pmc)))
+ return;
+
+ if (pmc_is_event_allowed(pmc))
pmc->eventsel_hw |= pmc->eventsel &
ARCH_PERFMON_EVENTSEL_ENABLE;
} else {
u64 mask = intel_fixed_bits_by_idx(pmc->idx - KVM_FIXED_PMC_BASE_IDX, 0xf);

pmu->fixed_ctr_ctrl_hw &= ~mask;
- if (allowed)
+ if (pmc_is_event_allowed(pmc))
pmu->fixed_ctr_ctrl_hw |= pmu->fixed_ctr_ctrl & mask;
}
}

Actually, this code is being silly. It unnecessarily performs the PMU filter
lookup when the _guest_ disables the counters via eventsel. If you first "fix"
that by querying pmc_is_locally_enabled() before checking the event filter, then
you won't even need to touch that code when introducing H/G bits, because it will
Just Work thanks to pmc_is_locally_enabled() doing the right thing (as it should,
becase as mentioned early, that logic is architectural).

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 7b2b4ce6bdad..c84f2f971ab1 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -530,22 +530,23 @@ static bool pmc_is_event_allowed(struct kvm_pmc *pmc)
return is_fixed_event_allowed(filter, pmc->idx);
}

-static void kvm_mediated_pmu_refresh_event_filter(struct kvm_pmc *pmc)
+static void kvm_mediated_pmu_refresh_eventsel_hw(struct kvm_pmc *pmc)
{
- bool allowed = pmc_is_event_allowed(pmc);
+ bool allowed = pmc_is_locally_enabled(pmc) && pmc_is_event_allowed(pmc);
struct kvm_pmu *pmu = pmc_to_pmu(pmc);

if (pmc_is_gp(pmc)) {
- pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
if (allowed)
- pmc->eventsel_hw |= pmc->eventsel &
- ARCH_PERFMON_EVENTSEL_ENABLE;
+ pmc->eventsel_hw |= ARCH_PERFMON_EVENTSEL_ENABLE;
+ else
+ pmc->eventsel_hw &= ~ARCH_PERFMON_EVENTSEL_ENABLE;
} else {
u64 mask = intel_fixed_bits_by_idx(pmc->idx - KVM_FIXED_PMC_BASE_IDX, 0xf);

- pmu->fixed_ctr_ctrl_hw &= ~mask;
if (allowed)
- pmu->fixed_ctr_ctrl_hw |= pmu->fixed_ctr_ctrl & mask;
+ pmu->fixed_ctr_ctrl_hw |= mask;
+ else
+ pmu->fixed_ctr_ctrl_hw &= ~mask;
}
}

@@ -558,7 +559,7 @@ static int reprogram_counter(struct kvm_pmc *pmc)
u8 fixed_ctr_ctrl;

if (kvm_vcpu_has_mediated_pmu(pmu_to_vcpu(pmu))) {
- kvm_mediated_pmu_refresh_event_filter(pmc);
+ kvm_mediated_pmu_refresh_eventsel_hw(pmc);
return 0;
}

> Also, would you rather I send a new version with everything, or do you
> want to pick up some of the patches in this version independently?

Uh, probably just send a new version. I doubt I'll get through the first few
patches before you're ready to send the next version.