Re: [PATCH v2 3/5] KVM: X86: Boost vCPU which is in critical section

From: Peter Zijlstra
Date: Thu Apr 14 2022 - 04:08:41 EST


On Wed, Apr 13, 2022 at 09:43:03PM +0000, Sean Christopherson wrote:
> +tglx and PeterZ
>
> On Fri, Apr 01, 2022, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@xxxxxxxxxxx>
> >
> > The missing semantic gap that occurs when a guest OS is preempted
> > when executing its own critical section, this leads to degradation
> > of application scalability. We try to bridge this semantic gap in
> > some ways, by passing guest preempt_count to the host and checking
> > guest irq disable state, the hypervisor now knows whether guest
> > OSes are running in the critical section, the hypervisor yield-on-spin
> > heuristics can be more smart this time to boost the vCPU candidate
> > who is in the critical section to mitigate this preemption problem,
> > in addition, it is more likely to be a potential lock holder.
> >
> > Testing on 96 HT 2 socket Xeon CLX server, with 96 vCPUs VM 100GB RAM,
> > one VM running benchmark, the other(none-2) VMs running cpu-bound
> > workloads, There is no performance regression for other benchmarks
> > like Unixbench etc.
>
> ...
>
> > Signed-off-by: Wanpeng Li <wanpengli@xxxxxxxxxxx>
> > ---
> > arch/x86/kvm/x86.c | 22 ++++++++++++++++++++++
> > include/linux/kvm_host.h | 1 +
> > virt/kvm/kvm_main.c | 7 +++++++
> > 3 files changed, 30 insertions(+)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9aa05f79b743..b613cd2b822a 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10377,6 +10377,28 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> > return r;
> > }
> >
> > +static bool kvm_vcpu_is_preemptible(struct kvm_vcpu *vcpu)
> > +{
> > + int count;
> > +
> > + if (!vcpu->arch.pv_pc.preempt_count_enabled)
> > + return false;
> > +
> > + if (!kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_pc.preempt_count_cache,
> > + &count, sizeof(int)))
> > + return !(count & ~PREEMPT_NEED_RESCHED);
>
> As I pointed out in v1[*], this makes PREEMPT_NEED_RESCHED and really the entire
> __preempt_count to some extent, KVM guest/host ABI. That needs acks from sched
> folks, and if they're ok with it, needs to be formalized somewhere in kvm_para.h,
> not buried in the KVM host code.

Right, not going to happen. There's been plenty changes to
__preempt_count over the past years, suggesting that making it ABI will
be an incredibly bad idea.

It also only solves part of the problem; namely spinlocks, but doesn't
help at all with mutexes, which can be equally short lived, as evidenced
by the adaptive spinning mutex code etc..

Also, I'm not sure I fully understand the problem, doesn't the paravirt
spinlock code give sufficient clues?