Re: [RFC PATCH v3 5/8] KVM: arm64: Introduce module param to partition the PMU

From: James Clark
Date: Thu Mar 27 2025 - 05:18:23 EST




On 26/03/2025 8:40 pm, Oliver Upton wrote:
On Wed, Mar 26, 2025 at 05:38:34PM +0000, James Clark wrote:
On 25/03/2025 6:32 pm, Colton Lewis wrote:
I don't know if this is a stupid idea, but instead of having a fixed
number for the partition, wouldn't it be nice if we could trap and
increment HPMN on the first guest use of a counter, then decrement it on
guest exit depending on what's still in use? The host would always
assign its counters from the top down, and guests go bottom up if they
want PMU passthrough. Maybe it's too complicated or won't work for
various reasons, but because of BRBE the counter partitioning changes go
from an optimization to almost a necessity.

This is a cool idea that would enable useful things. I can think of a
few potential problems.

1. Partitioning will give guests direct access to some PMU counter
registers. There is no reliable way for KVM to determine what is in use
from that state. A counter that is disabled guest at exit might only be
so temporarily, which could lead to a lot of thrashing allocating and
deallocating counters.

KVM must always have a reliable way to determine if the PMU is in use.
If there's any counter in the vPMU for which kvm_pmu_counter_is_enabled()
is true would do the trick...

Generally speaking, I would like to see the guest/host context switch in
KVM modeled in a way similar to the debug registers, where the vPMU
registers are loaded onto hardware lazily if either:

1) The above definition of an in-use PMU is satisfied

2) The guest accessed a PMU register since the last vcpu_load()

2. HPMN affects reads of PMCR_EL0.N, which is the standard way to
determine how many counters there are. If HPMN starts as a low number,
guests have no way of knowing there are more counters
available. Dynamically changing the counters available could be
confusing for guests.


Yes I was expecting that PMCR would have to be trapped and N reported to be
the number of physical counters rather than how many are in the guest
partition.

I'm not sure this is aligned with the spirit of the feature.

Colton's aim is to minimize the overheads of trapping the PMU *and*
relying on the perf subsystem for event scheduling. To do dynamic
partitioning as you've described, KVM would need to unconditionally trap
the PMU registers so it can pack the guest counters into the guest
partition. We cannot assume the VM will allocate counters sequentially.

Yeah I agree, requiring cooperation from the guest probably makes it a non starter.


Dynamic counter allocation can be had with the existing PMU
implementation. The partitioned PMU is an alternative userspace can
select, not a replacement for what we already have.

Thanks,
Oliver


It's just a shame that it doesn't look like there's a way to make BRBE work properly in guests with the existing implementation. Maybe we're stuck with only allowing it in a partition for now.

Thanks
James