Re: [RFC PATCH 00/41] KVM: x86/pmu: Introduce passthrough vPM
From: Sean Christopherson
Date: Fri Apr 19 2024 - 15:14:37 EST
On Thu, Apr 18, 2024, Mingwei Zhang wrote:
> On Thu, Apr 11, 2024, Sean Christopherson wrote:
> > <bikeshed>
> >
> > I think we should call this a mediated PMU, not a passthrough PMU. KVM still
> > emulates the control plane (controls and event selectors), while the data is
> > fully passed through (counters).
> >
> > </bikeshed>
> Sean,
>
> I feel "mediated PMU" seems to be a little bit off the ..., no? In
> KVM, almost all of features are mediated. In our specific case, the
> legacy PMU is mediated by KVM and perf subsystem on the host. In new
> design, it is mediated by KVM only.
Currently, at a feature level, I mentally bin things into two rough categories
in KVM:
1. Virtualized - Guest state is loaded into hardware, or hardware supports
running with both host and guest state (e.g. TSC scaling), and
the guest has full read/write access to its state while running.
2. Emulated - Guest state is never loaded into hardware, and instead the
feature/state is emulated in software.
There is no "Passthrough" because that's (mostly) covered by my Virtualized
definition. And because I also think of passthrough as being about *assets*,
not about the features themselves.
They are far from perfect definitions, e.g. individual assets can be passed through,
virtualized by hardware, or emulated in software. But for the most part, I think
classifying features as virtualized vs. emulated works well, as it helps reason
about the expected behavior and performance of a feature.
E.g. for some virtualized features, certain assets may need to be explicitly passed
through, e.g. access to x2APIC MSRs for APICv. But APICv itself still falls
into the virtualized category, e.g. the "real" APIC state isn't passed through
to the guest.
If KVM didn't already have a PMU implementation to deal with, this wouldn't be
an issue, e.g. we'd just add "enable_pmu" and I'd mentally bin it into the
virtualized category. But we need to distinguish between the two PMU models,
and using "enable_virtualized_pmu" would be comically confusing for users. :-)
And because this is user visible, I would like to come up with a name that (some)
KVM users will already be familiar with, i.e. will have some chance of intuitively
understand without having to go read docs.
Which is why I proposed "mediated"; what we are proposing for the PMU is similar
to the "mediated device" concepts in VFIO. And I also think "mediated" is a good
fit in general, e.g. this becomes my third classification:
3. Mediated - Guest is context switched at VM-Enter/VM-Exit, i.e. is loaded
into hardware, but the guest does NOT have full read/write access
to the feature.
But my main motiviation for using "mediated" really is that I hope that it will
help KVM users grok the basic gist of the design without having to read and
understand KVM documentation, because there is already existing terminology in
the broader KVM space.
> We intercept the control plan in current design, but the only thing
> we do is the event filtering. No fancy code change to emulate the control
> registers. So, it is still a passthrough logic.
It's not though. Passthrough very specifically means the guest has unfettered
access to some asset, and/or KVM does no filtering/adjustments whatseover.
"Direct" is similar, e.g. KVM's uses "direct" in MMU context to refer to addresses
that don't require KVM to intervene and translate. E.g. entire MMUs can be direct,
but individual shadow pages can also be direct (no corresponding guest PTE to
translate).
For this flavor of PMU, it's not full passthrough or direct. Some assets are
passed through, e.g. PMCs, but others are not.
> In some (rare) business cases, I think maybe we could fully passthrough
> the control plan as well. For instance, sole-tenant machine, or
> full-machine VM + full offload. In case if there is a cpu errata, KVM
> can force vmexit and dynamically intercept the selectors on all vcpus
> with filters checked. It is not supported in current RFC, but maybe
> doable in later versions.
Heh, that's an argument for using something other than "passthrough", because if
we ever do support such a use case, we'd end up with enable_fully_passthrough_pmu,
or in the spirit of KVM shortlogs, really_passthrough_pmu :-)
Though I think even then I would vote for "enable_dedicated_pmu", or something
along those lines, purely to avoid overloading "passthrough", i.e. to try to use
passhtrough strictly when talking about assets, not features. And because unless
we can also passthrough LVTPC, it still wouldn't be a complete passthrough of the
PMU as KVM would be emulating PMIs.