Re: [PATCH v5 00/24] ARM64 PMU Partitioning

From: Colton Lewis

Date: Tue Apr 14 2026 - 15:58:35 EST


Colton Lewis <coltonlewis@xxxxxxxxxx> writes:

Will Deacon <will@xxxxxxxxxx> writes:

On Tue, Dec 09, 2025 at 03:00:59PM -0800, Oliver Upton wrote:
On Tue, Dec 09, 2025 at 08:50:57PM +0000, Colton Lewis wrote:
> This series creates a new PMU scheme on ARM, a partitioned PMU that
> allows reserving a subset of counters for more direct guest access,
> significantly reducing overhead. More details, including performance
> benchmarks, can be read in the v1 cover letter linked below.
>
> An overview of what this series accomplishes was presented at KVM
> Forum 2025. Slides [1] and video [2] are linked below.
>
> The long duration between v4 and v5 is due to time spent on this
> project being monopolized preparing this feature for internal
> production. As a result, there are too many improvements to fully list
> here, but I will cover the notable ones.

Thanks for reposting. I think there's still quite a bit of ground to
cover on the KVM side of this, but I would definitely appreciate it if
someone with more context on the perf side of things could chime in.

Will, IIRC you had some thoughts around counter allocation, right?

Right, I was hoping that the host counter reservation could be more
dynamic than a cmdline option. Perf already has support for pinning
events to a CPU, so the concept of some counters being unavailable
shouldn't be too much for the driver to handle. You might just need to
create some fake pinned events so that perf code understands what is
happening.

Thanks Will. I have a few followup questions:

1. Are you suggesting this be done whenever we enter a guest so the host
always has access to the full range in host context? That would be the
most dynamic.

2. How should we handle the possibility a real event already occupies a
counter wanted by the guest? Is there a good way to create our fake
pinned events then force a reschedule so perf moves the real events out
of the way?

3. Is there an existing fake event type that tells perf not to touch
hardware?

4. Can you point to any example code that already does something like
this?

Thank you Will and Mark for meeting with me to discuss things in person.

Here's my main takeaways so the list can comment:

Will's initial idea doesn't work because there is no way for KVM to pin
counters in a way that takes priority over counters pinned by the host
and therefore guarantee reservation.

An alternate idea I am proposing is to call the perf core
sched_in/sched_out functionality during vcpu_load/vcpu_put when guest
counters need to be reserved/unreserved.

That means having perf vacate all the host counters temporarily,
modifying the arm_pmu.cntr_mask to add/remove the appropriate counters,
then having perf schedule all host events back on the new set. Perf is
capable of doing that without any significant changes.

This is simple and should work because arm_pmu.cntr_mask is already
accessible from the vcpu struct and modifying it is already how the
existing boot-time counter reservation works.

There are some tradeoffs to this approach that will need further
consideration. The first is how to handle event groups. Perf allows
events to be grouped such that they must all be scheduled in at once. If
the host has a larger group than the number of counters available while
the vcpu is loaded, then it simply won't be able to schedule that group
in for that time period. Another is whether it will be acceptable
performance-wise to put perf sched_in/sched_out in
vcpu_load/vcpu_put. I'm unsure how much delay that would add to those
paths.

Absent strong objections, I will be posting a series using this method.

Another idea that was not discussed that I had later is a middle
approach that is less dynamic but gives the user control over when the
perf sched_in/sched_out happens. Expose the existing boot-time parameter
as writable in sysfs and do the sched_out/modify mask/sched_in when that
is written rather than in vcpu_load.