Re: [RFC 0/3] Export APICv-related state via binary stats interface

From: Paolo Bonzini
Date: Wed Apr 17 2024 - 05:48:28 EST


On Wed, Apr 17, 2024 at 12:35 AM Sean Christopherson <seanjc@googlecom> wrote:
> > > The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
> > > once they're added, and KVM needs to maintain the exact behavior.
> >
> > Stats are not ABI---why would they be?
>
> Because they exposed through an ioctl(), and userspace can and will use stats for
> functional purposes? Maybe I just had the wrong takeaway from an old thread about
> adding a big pile of stats[1], where unfortunately (for me) you weighed in on
> whether or not tracepoints are ABI, but not stats.

I think we can agree that:
- you don't want hundreds of stats (Marc's point)
- a large part of the stats are very stable, but just as many (while
useful) depend on details which are very much implementation dependent
- a subset of stats is pretty close to being ABI (e.g. guest_mode),
but others can change meaning depending on processor model, guest
configuration and/or guest type (e.g. all of them affect
interrupt-related stats due to APICv inhibits).

While there are exceptions, the main consumer of stats (but indeed not
the only one) is intended to be the user, not a program. This is the
same as tracepoints, and it's why the introspection bits exist.
(User-friendliness also means that bitmask stats are "ouch"; I guess
we could add support for bit-sized boolean stats is things get out of
control).

For many stats, using them for functional purposes would be
wrong/dumb. You have to be ready for the exact behavior to change even
if the stats remain the same. If userspace doesn't, it's being dumb.
KVM can't be blocked from supporting new features just because they
"break" stats, and shouldn't be blocked from adding useful debugging
stats just because userspace could be dumb.

For example, the point of pf_fast is mostly to compare it with other
pf_* stats and see if there's something smelly going on.pf_fast used
to affect pretty much only dirty bits; nowadays it also affects
accessed bits on !ept_ad machines and it does not affect dirty bits if
you have PML. So in the past it was possible to use pf_fast as a proxy
for the number of dirty pages, for example during migration. That
usage doesn't work anymore. Tough luck.

Perhaps you could use the halt-polling stats to toggle DISABLE_EXITS
for VMs that consume a lot of time polling. That would be a more
plausible use of stats to drive heuristics, but again, the power to
look into low-level details of KVM and guest behavior comes with the
accompanying responsibility. It is _not_ guaranteed to keep working in
the same way as processors come out and optimizations are added to
KVM.

So you would have to look at intentional stats breakages one by one,
and this is again a lot like tracepoints. And many potential breakages
would go unnoticed anyway, because there's an infinite supply of bad
ideas when it comes to stats and heuristics.

> That said, I'm definitely not opposed to stats _not_ being ABI, because that would
> give us a ton of flexibility. E.g. we have a non-trivial number of internal stats
> that are super useful _for us_, but are rather heavy and might not be desirable
> for most environments. If stats aren't considered ABI, then I'm pretty sure we
> could land some of the more generally useful ones upstream, but off-by-default
> and guarded by a Kconfig. E.g. we have a pile of stats related to mmu_lock that
> are very helpful in identifying performance issues, but they aren't things I would
> want enabled by default.

That would be great indeed.

> > Not everything makes a good stat but, if in doubt and it's cheap
> > enough to collect it, go ahead and add it.
>
> Marc raised the (IMO) valid concern that "if it's cheap, add it" will lead to
> death by a thousand cuts. E.g. add a few hundred vCPU stats and suddenly vCPUs
> consumes an extra KiB or three of memory.
>
> A few APIC stats obviously aren't going to move the needle much, I'm just pointing
> out that not everyone agrees that KVM should be hyper permissive when it comes to
> adding new stats.

Yeah, that's why I made it conditional to "if in doubt". "Stats are
not ABI" is not a free pass to add anything, also because the truth is
closer than "Stats are generally not ABI but keeping them stable is a
good idea". Many more stats are obviously bad to have upstream, than
there are good ones; and when adding stats it makes sense to consider
their stability but without making it an absolute criterion for
inclusion.

So for this patch, I would weigh advantages to be substantial:
+ APICv inhibits at this point are relatively stable
+ the performance impact is large enough that APICv/AVIC stats _can_
be useful, both boolean and cumulative ones; so for example I'd add an
interrupt_injections stat for unaccelerated injections causing a
vmexit or otherwise hitting lapic.c.

But absolutely would not go with a raw bitmask because:
- the exact set of inhibits is subject to change
- super high detail into niche APICv inhibits is unlikely to be useful
- many if not most inhibits are trivially derived from VM configuration

Paolo