Re: [RFC 0/3] Export APICv-related state via binary stats interface
From: Paolo Bonzini
Date: Tue Apr 16 2024 - 17:39:35 EST
On Tue, Apr 16, 2024 at 11:29 PM Alejandro Jimenez
<alejandro.j.jimenez@xxxxxxxxxx> wrote:
>
>
>
> On 4/16/24 15:51, Paolo Bonzini wrote:
> > On Tue, Apr 16, 2024 at 8:08 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >>
> >> On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
> >>> The goal of this RFC is to agree on a mechanism for querying the state (and
> >>> related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
> >>> topic since that is the side that I have mostly looked at, and has the greater
> >>> number of possible inhibits, but I believe the argument applies for both
> >>> vendor's technologies.
> >>>
> >>> Currently, a user or monitoring app trying to determine if APICv is actually
> >>> being used needs implementation-specific knowlegde in order to look for specific
> >>> types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
> >>> by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
> >>> (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
> >>> tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
> >>> downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
> >>> this information, but there has not been any development in that direction.
> >>> Sean has mentioned a preference for using BPF to extract info from the current
> >>> tracepoints, which would require reworking existing structs to access some
> >>> desired data, but as far as I know there isn't any work done on that approach
> >>> yet.
> >>>
> >>> Recently Joao mentioned another alternative: the binary stats framework that is
> >>> already supported by kernel[1] and QEMU[2].
> >>
> >> The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
> >> once they're added, and KVM needs to maintain the exact behavior.
> >
> > Stats are not ABI---why would they be? They have an established
> > meaning and it's not a good idea to change it, but it's not an
> > absolute no-no(*); and removing them is not a problem at all.
> >
> > For example, if stats were ABI, there would be no need for the
> > introspection mechanism, you could just use a struct like ethtool
> > stats (which *are* ABO).
> >
> > Not everything makes a good stat but, if in doubt and it's cheap
> > enough to collect it, go ahead and add it. Cheap collection is the
> > important point, because tracepoints in a hot path can be so expensive
> > as to slow down the guest substantially, at least in microbenchmarks.
> >
> > In this case I'm not sure _all_ inhibits makes sense and I certainly
> > wouldn't want a bitmask,
>
> I wanted to be able to query enough info via stats (i.e. is APICv active, and if
> not, why is it inhibited?) that is exposed via the other interfaces which are not
> always available. That unfortunately requires a bit of "overloading" of
> the stat as I mentioned earlier, but it remains cheap to collect and expose.
>
> What would be your preferred interface to provide the (complete) APICv state?
>
> but a generic APICv-enabled stat certainly
> > makes sense, and perhaps another for a weirdly-configured local APIC.
>
> Can you expand on what you mean by "weirdly-configured"? Do you mean something
> like virtual wire mode?
I mean any of
APICV_INHIBIT_REASON_HYPERV,
APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED,
APICV_INHIBIT_REASON_APIC_ID_MODIFIED,
APICV_INHIBIT_REASON_APIC_BASE_MODIFIED,
APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED,
which in practice are always going to be APICV_INHIBIT_REASON_HYPERV
on 99.99% of the gues
ExtINT is visible via irq_window_exits; if you are not running nested,
do not have the problematic configurations above. don't have debugging
(BLOCKIRQ) or in-kernel PIT with reinjection, that's basically the
only one that's left.
Paolo
> Alejandro
>
> >
> > Paolo
> >
> > (*) you have to draw a line somewhere. New processor models may
> > virtualize parts of the CPU in such a way that some stats become
> > meaningless or just stay at zero. Should KVM not support those
> > features because it is not possible anymore to introspect the guest
> > through stat?
> >
> >> Tracepoints are explicitly not ABI, and so we can be much more permissive when it
> >> comes to adding/expanding tracepoints, specifically because there are no guarantees
> >> provided to userspace.
> >>
> >
> >
>