Re: [patch 2/3] KVM: x86: KVM_HC_RT_PRIO hypercall (host-side)
From: Paolo Bonzini
Date: Fri Sep 22 2017 - 03:23:54 EST
On 22/09/2017 03:08, Marcelo Tosatti wrote:
> On Thu, Sep 21, 2017 at 03:49:33PM +0200, Paolo Bonzini wrote:
>> On 21/09/2017 15:32, Konrad Rzeszutek Wilk wrote:
>>> So the guest can change the scheduling decisions at the host level?
>>> And the host HAS to follow it? There is no policy override for the
>>> host to say - nah, not going to do it?
>
> In that case the host should not even configure the guest with this
> option (this is QEMU's 'enable-rt-fifo-hc' option).
>
>>> Also wouldn't the guest want to always be at SCHED_FIFO? [I am thinking
>>> of a guest admin who wants all the CPU resources he can get]
>
> No. Because in the following code, executed by the housekeeping vCPU
> running at constant SCHED_FIFO priority:
>
> 1. Start disk I/O.
> 2. busy spin
>
> With the emulator thread sharing the same pCPU with the housekeeping
> vCPU, the emulator thread (which runs at SCHED_NORMAL), will never
> be scheduled in in place of the vcpu thread at SCHED_FIFO.
>
> This causes a hang.
But if the emulator thread can interrupt the housekeeping thread, the
emulator thread should also be SCHED_FIFO at higher priority; IIRC this
was in Jan's talk from a few years ago.
QEMU would also have to use PI mutexes (which is the main reason why
it's using QemuMutex instead of e.g. GMutex).
>> Yeah, I do not understand why there should be a housekeeping VCPU that
>> is running at SCHED_NORMAL. If it hurts, don't do it...
>
> Hope explanation above makes sense (in fact, it was you who pointed
> out SCHED_FIFO should not be constant on the housekeeping vCPU,
> when sharing pCPU with emulator thread at SCHED_NORMAL).
The two are not exclusive... As you point out, it depends on the
workload. For DPDK you can put both of them at SCHED_NORMAL. For
kernel-intensive uses you must use SCHED_FIFO.
Perhaps we could consider running these threads at SCHED_RR instead.
Unlike SCHED_NORMAL, I am not against a hypercall that bumps temporarily
SCHED_RR to SCHED_FIFO, but perhaps that's not even necessary.
Paolo