Re: [RFC PATCH v2] sched/fair: select idle cpu from idle cpumask in sched domain

From: Mel Gorman
Date: Wed Sep 16 2020 - 14:43:24 EST


On Wed, Sep 16, 2020 at 12:31:03PM +0800, Aubrey Li wrote:
> Added idle cpumask to track idle cpus in sched domain. When a CPU
> enters idle, its corresponding bit in the idle cpumask will be set,
> and when the CPU exits idle, its bit will be cleared.
>
> When a task wakes up to select an idle cpu, scanning idle cpumask
> has low cost than scanning all the cpus in last level cache domain,
> especially when the system is heavily loaded.
>
> The following benchmarks were tested on a x86 4 socket system with
> 24 cores per socket and 2 hyperthreads per core, total 192 CPUs:
>

This still appears to be tied to turning the tick off. An idle CPU
available for computation does not necessarily have the tick turned off
if it's for short periods of time. When nohz is disabled or a machine is
active enough that CPUs are not disabling the tick, select_idle_cpu may
fail to select an idle CPU and instead stack tasks on the old CPU.

The other subtlety is that select_idle_sibling() currently allows a
SCHED_IDLE cpu to be used as a wakeup target. The CPU is not really
idle as such, it's simply running a low priority task that is suitable
for preemption. I suspect this patch breaks that.

--
Mel Gorman
SUSE Labs