Re: [RFC PATCH v3 5/8] KVM: arm64: Introduce module param to partition the PMU

From: Oliver Upton
Date: Wed Mar 26 2025 - 16:41:11 EST


On Wed, Mar 26, 2025 at 05:38:34PM +0000, James Clark wrote:
> On 25/03/2025 6:32 pm, Colton Lewis wrote:
> > > I don't know if this is a stupid idea, but instead of having a fixed
> > > number for the partition, wouldn't it be nice if we could trap and
> > > increment HPMN on the first guest use of a counter, then decrement it on
> > > guest exit depending on what's still in use? The host would always
> > > assign its counters from the top down, and guests go bottom up if they
> > > want PMU passthrough. Maybe it's too complicated or won't work for
> > > various reasons, but because of BRBE the counter partitioning changes go
> > > from an optimization to almost a necessity.
> >
> > This is a cool idea that would enable useful things. I can think of a
> > few potential problems.
> >
> > 1. Partitioning will give guests direct access to some PMU counter
> > registers. There is no reliable way for KVM to determine what is in use
> > from that state. A counter that is disabled guest at exit might only be
> > so temporarily, which could lead to a lot of thrashing allocating and
> > deallocating counters.

KVM must always have a reliable way to determine if the PMU is in use.
If there's any counter in the vPMU for which kvm_pmu_counter_is_enabled()
is true would do the trick...

Generally speaking, I would like to see the guest/host context switch in
KVM modeled in a way similar to the debug registers, where the vPMU
registers are loaded onto hardware lazily if either:

1) The above definition of an in-use PMU is satisfied

2) The guest accessed a PMU register since the last vcpu_load()

> > 2. HPMN affects reads of PMCR_EL0.N, which is the standard way to
> > determine how many counters there are. If HPMN starts as a low number,
> > guests have no way of knowing there are more counters
> > available. Dynamically changing the counters available could be
> > confusing for guests.
> >
>
> Yes I was expecting that PMCR would have to be trapped and N reported to be
> the number of physical counters rather than how many are in the guest
> partition.

I'm not sure this is aligned with the spirit of the feature.

Colton's aim is to minimize the overheads of trapping the PMU *and*
relying on the perf subsystem for event scheduling. To do dynamic
partitioning as you've described, KVM would need to unconditionally trap
the PMU registers so it can pack the guest counters into the guest
partition. We cannot assume the VM will allocate counters sequentially.

Dynamic counter allocation can be had with the existing PMU
implementation. The partitioned PMU is an alternative userspace can
select, not a replacement for what we already have.

Thanks,
Oliver