Re: [PATCH 2/3] KVM: X86: Bail out of direct yield in case of undercomitted scenarios

From: Sean Christopherson
Date: Wed May 12 2021 - 15:35:21 EST


On Wed, May 12, 2021, Wanpeng Li wrote:
> On Wed, 12 May 2021 at 05:44, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Sat, May 08, 2021, Wanpeng Li wrote:
> > > From: Wanpeng Li <wanpengli@xxxxxxxxxxx>
> > >
> > > In case of undercomitted scenarios, vCPU can get scheduling easily,
> > > kvm_vcpu_yield_to adds extra overhead, we can observe a lot of race
> > > between vcpu->ready is true and yield fails due to p->state is
> > > TASK_RUNNING. Let's bail out is such scenarios by checking the length
> > > of current cpu runqueue.
> > >
> > > Signed-off-by: Wanpeng Li <wanpengli@xxxxxxxxxxx>
> > > ---
> > > arch/x86/kvm/x86.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 5bd550e..c0244a6 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -8358,6 +8358,9 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
> > > struct kvm_vcpu *target = NULL;
> > > struct kvm_apic_map *map;
> > >
> > > + if (single_task_running())
> > > + goto no_yield;
> > > +
> >
> > Hmm, could we push the result of kvm_sched_yield() down into the guest?
> > Currently the guest bails after the first attempt, which is perfect for this
> > scenario, but it seems like it would make sense to keep trying to yield if there
> > are multiple preempted vCPUs and
>
> It can have a race in case of sustain yield if there are multiple
> preempted vCPUs , the vCPU which you intend to yield may have already
> completed to handle IPI and be preempted now when the yielded sender
> is scheduled again and checks the next preempted candidate.

Ah, right, don't want to penalize the happy case.

> > Unrelated to this patch, but it's the first time I've really looked at the guest
> > side of directed yield...
> >
> > Wouldn't it also make sense for the guest side to hook .send_call_func_single_ipi?
>
> reschedule ipi is called by .smp_send_reschedule hook, there are a lot
> of researches intend to accelerate idle vCPU reactivation, my original
> attemption is to boost synchronization primitive, I believe we need a
> lot of benchmarkings to consider inter-VM fairness and performance
> benefit for hooks .send_call_func_single_ipi and
> .smp_send_reschedule.

I was thinking of the 2 vCPU case. If the VM has 2 vCPUs, then this

/*
* Choose the most efficient way to send an IPI. Note that the
* number of CPUs might be zero due to concurrent changes to the
* provided mask.
*/
if (nr_cpus == 1)
send_call_function_single_ipi(last_cpu);
else if (likely(nr_cpus > 1))
arch_send_call_function_ipi_mask(cfd->cpumask_ipi);

means .send_call_func_single_ipi() will always be used to send an IPI to the
other vCPU, and thus 2 vCPU VMs will never utilize PV yield.