Re: [RFC] sched/core: Don't schedule threads on pre-empted vcpus

From: Steven Sistare
Date: Fri May 04 2018 - 13:32:46 EST


On 5/4/2018 1:22 PM, Rohit Jain wrote:
> Hi Peter,
>
> On 05/04/2018 02:47 AM, Peter Zijlstra wrote:
>> On Wed, May 02, 2018 at 01:52:10PM -0700, Rohit Jain wrote:
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 5e10aae..75d1ecf 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -4033,6 +4033,9 @@ int idle_cpu(int cpu)
>>> ÂÂÂÂÂÂÂÂÂ return 0;
>>> Â #endif
>>> Â +ÂÂÂ if (vcpu_is_preempted(cpu))
>>> +ÂÂÂÂÂÂÂ return 0;
>>> +
>>> ÂÂÂÂÂ return 1;
>>> Â }
>> Basically OK with this, but did you consider idle_cpu() usage outside of
>> select_idle_sibling()?
>>
>> For instance, I think got_nohz_idle_kick() isn't quite right with this
>> on. Similarly for scheduler_tick(), that wants the actual idle state.
>
> As far as intent is concerned, yes I agree you might be right. I left
> the VM running for a couple of days, didn't see anything weird however.
>
> We could add a check at each of those places or something to that effect
> if this is an issue. Please let me know how you want to proceed.

The point is that some idle_cpu() call sites should consider preemption state
and some should not, and they must be considered on a case by case basis. You
could define a new accessor to abstract the difference, and call it from
select_idle_sibling and anywhere else it makes sense.

available_idle_cpu()
{
return idle_cpu() && !vcpu_is_preempted()
}

- Steve