Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall
From: Paolo Bonzini
Date: Sun Sep 24 2017 - 09:06:04 EST
----- Original Message -----
> From: "Peter Zijlstra" <peterz@xxxxxxxxxxxxx>
> To: "Paolo Bonzini" <pbonzini@xxxxxxxxxx>
> Cc: "Marcelo Tosatti" <mtosatti@xxxxxxxxxx>, "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx>, mingo@xxxxxxxxxx,
> kvm@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, "Thomas Gleixner" <tglx@xxxxxxxxxxxxx>
> Sent: Saturday, September 23, 2017 3:41:14 PM
> Subject: Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall
>
> On Sat, Sep 23, 2017 at 12:56:12PM +0200, Paolo Bonzini wrote:
> > On 22/09/2017 14:55, Peter Zijlstra wrote:
> > > You just explained it yourself. If the thread that needs to complete
> > > what you're waiting on has lower priority, it will _never_ get to run if
> > > you're busy waiting on it.
> > >
> > > This is _trivial_.
> > >
> > > And even for !RT it can be quite costly, because you can end up having
> > > to burn your entire slot of CPU time before you run the other task.
> > >
> > > Userspace spinning is _bad_, do not do this.
> >
> > This is not userspace spinning, it is guest spinning---which has
> > effectively the same effect but you cannot quite avoid.
>
> So I'm virt illiterate and have no clue on how all this works; but
> wasn't this a vmexit ? (that's what marcelo traced). And once you've
> done a vmexit you're a regular task again, not a vcpu.
His trace simply shows that the timer tick happened and the SCHED_NORMAL
thread was preempted. Bumping the vCPU thread to SCHED_FIFO drops
the scheduler tick (the system is NOHZ_FULL) and thus 1) the frequency
of EXTERNAL_INTERRUPT vmexits drops to 1 second 2) the thread is not
preempted anymore.
> > But I agree that the solution is properly prioritizing threads that can
> > interrupt the VCPU, and using PI mutexes.
>
> Right, if you want to run RT VCPUs the whole emulator/vcpu interaction
> needs to be designed for RT.
>
> > I'm not a priori opposed to paravirt scheduling primitives, but I am not
> > at all sure that it's required.
>
> Problem is that the proposed thing doesn't solve anything. There is
> nothing that prohibits the guest from triggering a vmexit while holding
> a spinlock and landing in the self-same problems.
Well, part of configuring virt for RT is (at all levels: host hypervisor+QEMU
and guest kernel+userspace) is that vmexits while holding a spinlock are either
confined to one vCPU or are handled in the host hypervisor very quickly, like
less than 2000 clock cycles.
So I'm not denying that Marcelo's approach solves the problem, but it's very
heavyweight and it masks an important misconfiguration (as you write above,
everything needs to be RT and the priorities must be designed carefully).
_However_, even if you do this, you may want to put the less important vCPUs
and the emulator threads on the same physical CPU. In that case, the vCPU
can be placed at SCHED_RR to avoid starvation (while the emulator thread needs
to stay at SCHED_FIFO and higher priority). Some kind of trick that bumps
spinlock critical sections in that vCPU to SCHED_FIFO, for a limited time only,
might still be useful.
Paolo