Re: [RFC][PATCH 00/16] sched: Core scheduling
From: Paolo Bonzini
Date: Thu Mar 07 2019 - 17:06:42 EST
On 22/02/19 15:10, Peter Zijlstra wrote:
>> I agree on not bike shedding about the API, but can we agree on some of
>> the high level properties? For example, who generates the core
>> scheduling ids, what properties about them are enforced, etc.?
> It's an opaque cookie; the scheduler really doesn't care. All it does is
> ensure that tasks match or force idle within a core.
>
> My previous patches got the cookie from a modified
> preempt_notifier_register/unregister() which passed the vcpu->kvm
> pointer into it from vcpu_load/put.
>
> This auto-grouped VMs. It was also found to be somewhat annoying because
> apparently KVM does a lot of userspace assist for all sorts of nonsense
> and it would leave/re-join the cookie group for every single assist.
> Causing tons of rescheduling.
KVM doesn't do _that much_ userspace exiting in practice when VMs are
properly configured (if they're not, you probably don't care about core
scheduling).
However, note that KVM needs core scheduling groups to be defined at the
thread level; one group per process is not enough. A VM has a bunch of
I/O threads and vCPU threads, and we want to set up core scheduling like
this:
+--------------------------------------+
| VM 1 iothread1 iothread2 |
| +----------------+-----------------+ |
| | vCPU0 vCPU1 | vCPU2 vCPU3 | |
| +----------------+-----------------+ |
+--------------------------------------+
+--------------------------------------+
| VM 1 iothread1 iothread2 |
| +----------------+-----------------+ |
| | vCPU0 vCPU1 | vCPU2 vCPU3 | |
| +----------------+-----------------+ |
| | vCPU4 vCPU5 | vCPU6 vCPU7 | |
| +----------------+-----------------+ |
+--------------------------------------+
where the iothreads need not be subject to core scheduling but the vCPUs
do. If you don't place guest-sibling vCPUs in the same core scheduling
group, bad things happen.
The reason is that the guest might also be running a core scheduler, so
you could have:
- guest process 1 registering two threads A and B in the same group
- guest process 2 registering two threads C and D in the same group
- guest core scheduler placing thread A on vCPU0, thread B on vCPU1,
thread C on vCPU2, thread D on vCPU3
- host core scheduler deciding the four threads can be in physical cores
0-1, but physical core 0 gets A+C and physical core 1 gets B+D
- now process 2 shares cache with process 1. :(
Paolo