Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
From: Steve Rutherford
Date: Thu Feb 25 2021 - 18:26:29 EST
On Thu, Feb 25, 2021 at 2:59 PM Steve Rutherford <srutherford@xxxxxxxxxx> wrote:
>
> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@xxxxxxx> wrote:
> >
> > On Wed, Feb 24, 2021 at 10:22:33AM -0800, Sean Christopherson wrote:
> > > On Wed, Feb 24, 2021, Ashish Kalra wrote:
> > > > # Samples: 19K of event 'kvm:kvm_hypercall'
> > > > # Event count (approx.): 19573
> > > > #
> > > > # Overhead Command Shared Object Symbol
> > > > # ........ ............... ................ .........................
> > > > #
> > > > 100.00% qemu-system-x86 [kernel.vmlinux] [k] kvm_emulate_hypercall
> > > >
> > > > Out of these 19573 hypercalls, # of page encryption status hcalls are 19479,
> > > > so almost all hypercalls here are page encryption status hypercalls.
> > >
> > > Oof.
> > >
> > > > The above data indicates that there will be ~2% more Heavyweight VMEXITs
> > > > during SEV guest boot if we do page encryption status hypercalls
> > > > pass-through to host userspace.
> > > >
> > > > But, then Brijesh pointed out to me and highlighted that currently
> > > > OVMF is doing lot of VMEXITs because they don't use the DMA pool to minimize the C-bit toggles,
> > > > in other words, OVMF bounce buffer does page state change on every DMA allocate and free.
> > > >
> > > > So here is the performance analysis after kernel and initrd have been
> > > > loaded into memory using grub and then starting perf just before booting the kernel.
> > > >
> > > > These are the performance #'s after kernel and initrd have been loaded into memory,
> > > > then perf is attached and kernel is booted :
> > > >
> > > > # Samples: 1M of event 'kvm:kvm_userspace_exit'
> > > > # Event count (approx.): 1081235
> > > > #
> > > > # Overhead Trace output
> > > > # ........ ........................
> > > > #
> > > > 99.77% reason KVM_EXIT_IO (2)
> > > > 0.23% reason KVM_EXIT_MMIO (6)
> > > >
> > > > # Samples: 1K of event 'kvm:kvm_hypercall'
> > > > # Event count (approx.): 1279
> > > > #
> > > >
> > > > So as the above data indicates, Linux is only making ~1K hypercalls,
> > > > compared to ~18K hypercalls made by OVMF in the above use case.
> > > >
> > > > Does the above adds a prerequisite that OVMF needs to be optimized if
> > > > and before hypercall pass-through can be done ?
> > >
> > > Disclaimer: my math could be totally wrong.
> > >
> > > I doubt it's a hard requirement. Assuming a conversative roundtrip time of 50k
> > > cycles, those 18K hypercalls will add well under a 1/2 a second of boot time.
> > > If userspace can push the roundtrip time down to 10k cycles, the overhead is
> > > more like 50 milliseconds.
> > >
> > > That being said, this does seem like a good OVMF cleanup, irrespective of this
> > > new hypercall. I assume it's not cheap to convert a page between encrypted and
> > > decrypted.
> > >
> > > Thanks much for getting the numbers!
> >
> > Considering the above data and guest boot time latencies
> > (and potential issues with OVMF and optimizations required there),
> > do we have any consensus on whether we want to do page encryption
> > status hypercall passthrough or not ?
> >
> > Thanks,
> > Ashish
>
> Thanks for grabbing the data!
>
> I am fine with both paths. Sean has stated an explicit desire for
> hypercall exiting, so I think that would be the current consensus.
>
> If we want to do hypercall exiting, this should be in a follow-up
> series where we implement something more generic, e.g. a hypercall
> exiting bitmap or hypercall exit list. If we are taking the hypercall
> exit route, we can drop the kvm side of the hypercall. Userspace could
> also handle the MSR using MSR filters (would need to confirm that).
> Then userspace could also be in control of the cpuid bit.
>
> Essentially, I think you could drop most of the host kernel work if
> there were generic support for hypercall exiting. Then userspace would
> be responsible for all of that. Thoughts on this?
>
> --Steve
This could even go a step further, and use an MSR write from within
the guest instead of a hypercall, which could be patched through to
userspace without host modification, if I understand the MSR filtering
correctly.
--Steve