Re: [RFC PATCH v2 0/5] Paravirt Scheduling (Dynamic vcpu priority management)
From: Steven Rostedt
Date: Fri Jul 12 2024 - 13:14:22 EST
On Fri, 12 Jul 2024 09:44:16 -0700
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > All we need is a notifier that gets called at every VMEXIT.
>
> Why? The only argument I've seen for needing to hook VM-Exit is so that the
> host can speculatively boost the priority of the vCPU when deliverying an IRQ,
> but (a) I'm unconvinced that is necessary, i.e. that the vCPU needs to be boosted
> _before_ the guest IRQ handler is invoked and (b) it has almost no benefit on
> modern hardware that supports posted interrupts and IPI virtualization, i.e. for
> which there will be no VM-Exit.
No. The speculatively boost was for something else, but slightly
related. I guess the ideal there was to have the interrupt coming in
boost the vCPU because the interrupt could be waking an RT task. It may
still be something needed, but that's not what I'm talking about here.
The idea here is when an RT task is scheduled in on the guest, we want
to lazily boost it. As long as the vCPU is running on the CPU, we do
not need to do anything. If the RT task is scheduled for a very short
time, it should not need to call any hypercall. It would set the shared
memory to the new priority when the RT task is scheduled, and then put
back the lower priority when it is scheduled out and a SCHED_OTHER task
is scheduled in.
Now if the vCPU gets preempted, it is this moment that we need the host
kernel to look at the current priority of the task thread running on
the vCPU. If it is an RT task, we need to boost the vCPU to that
priority, so that a lower priority host thread does not interrupt it.
The host should also set a bit in the shared memory to tell the guest
that it was boosted. Then when the vCPU schedules a lower priority task
than what is in shared memory, and the bit is set that tells the guest
the host boosted the vCPU, it needs to make a hypercall to tell the
host that it can lower its priority again.
The incoming irq is to handle the race between the event that wakes the
RT task, and the RT task getting a chance to run. If the preemption
happens there, the vCPU may never have a chance to notify the host that
it wants to run an RT task.
-- Steve