Re: [RFC] sched/core: Don't schedule threads on pre-empted vcpus

From: Rohit Jain
Date: Fri May 04 2018 - 13:38:15 EST


Hi Steve,


On 05/04/2018 10:32 AM, Steven Sistare wrote:
On 5/4/2018 1:22 PM, Rohit Jain wrote:
Hi Peter,

On 05/04/2018 02:47 AM, Peter Zijlstra wrote:
On Wed, May 02, 2018 at 01:52:10PM -0700, Rohit Jain wrote:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5e10aae..75d1ecf 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4033,6 +4033,9 @@ int idle_cpu(int cpu)
ÂÂÂÂÂÂÂÂÂ return 0;
 #endif
 + if (vcpu_is_preempted(cpu))
+ÂÂÂÂÂÂÂ return 0;
+
ÂÂÂÂÂ return 1;
 }
Basically OK with this, but did you consider idle_cpu() usage outside of
select_idle_sibling()?

For instance, I think got_nohz_idle_kick() isn't quite right with this
on. Similarly for scheduler_tick(), that wants the actual idle state.
As far as intent is concerned, yes I agree you might be right. I left
the VM running for a couple of days, didn't see anything weird however.

We could add a check at each of those places or something to that effect
if this is an issue. Please let me know how you want to proceed.
The point is that some idle_cpu() call sites should consider preemption state
and some should not, and they must be considered on a case by case basis. You
could define a new accessor to abstract the difference, and call it from
select_idle_sibling and anywhere else it makes sense.

available_idle_cpu()
{
return idle_cpu() && !vcpu_is_preempted()
}

Great! That's what I was thinking as "something to that effect" :)

Thanks,
Rohit