Re: [PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags

From: Christoffer Dall
Date: Wed Dec 12 2018 - 03:07:47 EST


On Tue, Dec 11, 2018 at 01:59:03PM +0000, Andrew Murray wrote:
> On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
> > [ Reviving old thread. ]
> >
> > Andrew Murray <andrew.murray@xxxxxxx> writes:
> > > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
> > >> Andrew Murray <andrew.murray@xxxxxxx> writes:
> > >>
> > >> > Update design.txt to reflect the presence of the exclude_host
> > >> > and exclude_guest perf flags.
> > >> >
> > >> > Signed-off-by: Andrew Murray <andrew.murray@xxxxxxx>
> > >> > ---
> > >> > tools/perf/design.txt | 4 ++++
> > >> > 1 file changed, 4 insertions(+)
> > >> >
> > >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> > >> > index a28dca2..7de7d83 100644
> > >> > --- a/tools/perf/design.txt
> > >> > +++ b/tools/perf/design.txt
> > >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
> > >> > way to request that counting of events be restricted to times when the
> > >> > CPU is in user, kernel and/or hypervisor mode.
> > >> >
> > >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> > >> > +to request counting of events restricted to guest and host contexts when
> > >> > +using virtualisation.
> > >>
> > >> How does exclude_host differ from exclude_hv ?
> > >
> > > I believe exclude_host / exclude_guest are intented to distinguish
> > > between host and guest in the hosted hypervisor context (KVM).
> >
> > OK yeah, from the perf-list man page:
> >
> > u - user-space counting
> > k - kernel counting
> > h - hypervisor counting
> > I - non idle counting
> > G - guest counting (in KVM guests)
> > H - host counting (not in KVM guests)
> >
> > > Whereas exclude_hv allows to distinguish between guest and
> > > hypervisor in the bare-metal type hypervisors.
> >
> > Except that's exactly not how we use them on powerpc :)
> >
> > We use exclude_hv to exclude "the hypervisor", regardless of whether
> > it's KVM or PowerVM (which is a bare-metal hypervisor).
> >
> > We don't use exclude_host / exclude_guest at all, which I guess is a
> > bug, except I didn't know they existed until this thread.
> >
> > eg, in a KVM guest:
> >
> > $ perf record -e cycles:G /bin/bash -c "for i in {0..100000}; do :;done"
> > $ perf report -D | grep -Fc "dso: [hypervisor]"
> > 16
> >
> >
> > > In the case of arm64 - if VHE extensions are present then the host
> > > kernel will run at a higher privilege to the guest kernel, in which
> > > case there is no distinction between hypervisor and host so we ignore
> > > exclude_hv. But where VHE extensions are not present then the host
> > > kernel runs at the same privilege level as the guest and we use a
> > > higher privilege level to switch between them - in this case we can
> > > use exclude_hv to discount that hypervisor role of switching between
> > > guests.
> >
> > I couldn't find any arm64 perf code using exclude_host/guest at all?
>
> Correct - but this is in flight as I am currently adding support for this
> see [1].
>
> >
> > And I don't see any x86 code using exclude_hv.
>
> I can't find any either.
>
> >
> > But maybe that's OK, I just worry this is confusing for users.
>
> There is some extra context regarding this where exclude_guest/exclude_host
> was added, see [2] and where exclude_hv was added, see [3]
>
> Generally it seems that exclude_guest/exclude_host relies upon switching
> counters off/on on guest/host switch code (which works well in the nested
> virt case). Whereas exclude_hv tends to rely solely on hardware capability
> based on privilege level (which works well in the bare metal case where
> the guest doesn't run at same privilege as the host).
>
> I think from the user perspective exclude_hv allows you to see your overhead
> if you are a guest (i.e. work done by bare metal hypervisor associated with
> you as the guest). Whereas exclude_guest/exclude_host doesn't allow you to
> see events above you (i.e. the kernel hypervisor) if you are the guest...
>
> At least that's how I read this, I've copied in others that may provide
> more authoritative feedback.
>
> [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-December/033698.html
> [2] https://www.spinics.net/lists/kvm/msg53996.html
> [3] https://lore.kernel.org/patchwork/patch/143918/
>

I'll try to answer this in a different way, based on previous
discussions with Joerg et al. who introduced these flags. Assume no
support for nested virtualization as a first approximation:

If you are running as a guest:
- exclude_hv: stop counting events when the hypervisor runs
- exclude_host: has no effect
- exclude_guest: has no effect

If you are running as a host/hypervisor:
- exclude_hv: has no effect
- exclude_host: only count events when the guest is running
- exclude_guest: only count events when the host is running

With nested virtualization, you get the natural union of the above.

**This has nothing to do with the design of the hypervisor such as the
ARM non-VHE KVM which splits its execution across EL1 and EL2 -- those
are both considered host from the point of view of Linux as a hypervisor
using KVM, and both considered hypervisor from the point of view of a
guest.**


Thanks,

Christoffer