Re: [RFC PATCH v2] sched/fair: select idle cpu from idle cpumask in sched domain

From: Valentin Schneider
Date: Wed Sep 16 2020 - 16:00:02 EST



On 16/09/20 12:00, Mel Gorman wrote:
> On Wed, Sep 16, 2020 at 12:31:03PM +0800, Aubrey Li wrote:
>> Added idle cpumask to track idle cpus in sched domain. When a CPU
>> enters idle, its corresponding bit in the idle cpumask will be set,
>> and when the CPU exits idle, its bit will be cleared.
>>
>> When a task wakes up to select an idle cpu, scanning idle cpumask
>> has low cost than scanning all the cpus in last level cache domain,
>> especially when the system is heavily loaded.
>>
>> The following benchmarks were tested on a x86 4 socket system with
>> 24 cores per socket and 2 hyperthreads per core, total 192 CPUs:
>>
>
> This still appears to be tied to turning the tick off. An idle CPU
> available for computation does not necessarily have the tick turned off
> if it's for short periods of time. When nohz is disabled or a machine is
> active enough that CPUs are not disabling the tick, select_idle_cpu may
> fail to select an idle CPU and instead stack tasks on the old CPU.
>

Vincent was pointing out in v1 that we ratelimit nohz_balance_exit_idle()
by having it happen on a tick to prevent being hammered by a flurry of
idle enter / exit sub tick granularity. I'm afraid flipping bits of this
cpumask on idle enter / exit might be too brutal.

> The other subtlety is that select_idle_sibling() currently allows a
> SCHED_IDLE cpu to be used as a wakeup target. The CPU is not really
> idle as such, it's simply running a low priority task that is suitable
> for preemption. I suspect this patch breaks that.

I think you're spot on.

An alternative I see here would be to move this into its own
select_idle_foo() function. If that mask is empty or none of the tagged
CPUs actually pass available_idle_cpu(), we fall-through to the usual idle
searches.

That's far from perfect; you could wake a truly idle CPU instead of
preempting a SCHED_IDLE task on a warm and busy CPU. I'm not sure if a
proliferation of cpumask really is the answer to that...