Re: [RFC 0/3] Export APICv-related state via binary stats interface

From: Sean Christopherson
Date: Tue Apr 16 2024 - 18:35:16 EST


On Tue, Apr 16, 2024, Paolo Bonzini wrote:
> On Tue, Apr 16, 2024 at 8:08 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
> > > The goal of this RFC is to agree on a mechanism for querying the state (and
> > > related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
> > > topic since that is the side that I have mostly looked at, and has the greater
> > > number of possible inhibits, but I believe the argument applies for both
> > > vendor's technologies.
> > >
> > > Currently, a user or monitoring app trying to determine if APICv is actually
> > > being used needs implementation-specific knowlegde in order to look for specific
> > > types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
> > > by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
> > > (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
> > > tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
> > > downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
> > > this information, but there has not been any development in that direction.
> > > Sean has mentioned a preference for using BPF to extract info from the current
> > > tracepoints, which would require reworking existing structs to access some
> > > desired data, but as far as I know there isn't any work done on that approach
> > > yet.
> > >
> > > Recently Joao mentioned another alternative: the binary stats framework that is
> > > already supported by kernel[1] and QEMU[2].
> >
> > The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
> > once they're added, and KVM needs to maintain the exact behavior.
>
> Stats are not ABI---why would they be?

Because they exposed through an ioctl(), and userspace can and will use stats for
functional purposes? Maybe I just had the wrong takeaway from an old thread about
adding a big pile of stats[1], where unfortunately (for me) you weighed in on
whether or not tracepoints are ABI, but not stats.

And because I've have been operating under the assumption that stats are ABI, I've
been guiding people to using stats to make decisions in userspace. E.g. KVM doesn't
currently expose is_guest_mode() in kvm_run, but it is a stat, so it's not hard
to imagine userspace using the stat to make decisions without needing to call back
into KVM[2].

And based on the old discussion[1] I doubt I'm though only one that has this view.

That said, I'm definitely not opposed to stats _not_ being ABI, because that would
give us a ton of flexibility. E.g. we have a non-trivial number of internal stats
that are super useful _for us_, but are rather heavy and might not be desirable
for most environments. If stats aren't considered ABI, then I'm pretty sure we
could land some of the more generally useful ones upstream, but off-by-default
and guarded by a Kconfig. E.g. we have a pile of stats related to mmu_lock that
are very helpful in identifying performance issues, but they aren't things I would
want enabled by default.

But if we do decide stats aren't ABI, then we need to document and disseminate
that information.

[1] https://lore.kernel.org/all/CALzav=cuzT=u6G0TCVZUfEgAKOCKTSCDE8x2v5qc-Gd_NL-pzg@xxxxxxxxxxxxxx
[2] https://lore.kernel.org/all/Zh6-e9hy7U6DD2QM@xxxxxxxxxx

> They have an established meaning and it's not a good idea to change it, but
> it's not an absolute no-no(*); and removing them is not a problem at all.
>
> For example, if stats were ABI, there would be no need for the
> introspection mechanism, you could just use a struct like ethtool
> stats (which *are* ABO).
>
> Not everything makes a good stat but, if in doubt and it's cheap
> enough to collect it, go ahead and add it.

Marc raised the (IMO) valid concern that "if it's cheap, add it" will lead to
death by a thousand cuts. E.g. add a few hundred vCPU stats and suddenly vCPUs
consumes an extra KiB or three of memory.

A few APIC stats obviously aren't going to move the needle much, I'm just pointing
out that not everyone agrees that KVM should be hyper permissive when it comes to
adding new stats.

> Cheap collection is the important point, because tracepoints in a hot path
> can be so expensive as to slow down the guest substantially, at least in
> microbenchmarks.
>
> In this case I'm not sure _all_ inhibits makes sense and I certainly
> wouldn't want a bitmask, but a generic APICv-enabled stat certainly
> makes sense, and perhaps another for a weirdly-configured local APIC.
>
> Paolo
>
> (*) you have to draw a line somewhere. New processor models may
> virtualize parts of the CPU in such a way that some stats become
> meaningless or just stay at zero. Should KVM not support those
> features because it is not possible anymore to introspect the guest
> through stat?