On 09/25/2012 10:09 AM, Raghavendra K T wrote:On 09/24/2012 09:36 PM, Avi Kivity wrote:On 09/24/2012 05:41 PM, Avi Kivity wrote:
case 2)
rq1 : vcpu1->wait(lockA) (spinning)
rq2 : vcpu3 (running) , vcpu2->holding(lockA) [scheduled out]
I agree that checking rq1 length is not proper in this case, and as
you
rightly pointed out, we are in trouble here.
nr_running()/num_online_cpus() would give more accurate picture here,
but it seemed costly. May be load balancer save us a bit here in not
running to such sort of cases. ( I agree load balancer is far too
complex).
In theory preempt notifier can tell us whether a vcpu is preempted or
not (except for exits to userspace), so we can keep track of whether
it's we're overcommitted in kvm itself. It also avoids false positives
from other guests and/or processes being overcommitted while our vm
is fine.
It also allows us to cheaply skip running vcpus.
Hi Avi,
Could you please elaborate on how preempt notifiers can be used
here to keep track of overcommit or skip running vcpus?
Are we planning set some flag in sched_out() handler etc?
Keep a bitmap kvm->preempted_vcpus.
In sched_out, test whether we're TASK_RUNNING, and if so, set a vcpu
flag and our bit in kvm->preempted_vcpus. On sched_in, if the flag is
set, clear our bit in kvm->preempted_vcpus. We can also keep a counter
of preempted vcpus.
We can use the bitmap and the counter to quickly see if spinning is
worthwhile (if the counter is zero, better to spin). If not, we can use
the bitmap to select target vcpus quickly.
The only problem is that in order to keep this accurate we need to keep
the preempt notifiers active during exits to userspace. But we can
prototype this without this change, and add it later if it works.