Re: [RFC PATCH v2] sched/fair: select idle cpu from idle cpumask in sched domain

From: Vincent Guittot
Date: Wed Sep 16 2020 - 14:36:36 EST


On Wed, 16 Sep 2020 at 13:00, Mel Gorman <mgorman@xxxxxxx> wrote:
>
> On Wed, Sep 16, 2020 at 12:31:03PM +0800, Aubrey Li wrote:
> > Added idle cpumask to track idle cpus in sched domain. When a CPU
> > enters idle, its corresponding bit in the idle cpumask will be set,
> > and when the CPU exits idle, its bit will be cleared.
> >
> > When a task wakes up to select an idle cpu, scanning idle cpumask
> > has low cost than scanning all the cpus in last level cache domain,
> > especially when the system is heavily loaded.
> >
> > The following benchmarks were tested on a x86 4 socket system with
> > 24 cores per socket and 2 hyperthreads per core, total 192 CPUs:
> >
>
> This still appears to be tied to turning the tick off. An idle CPU
> available for computation does not necessarily have the tick turned off
> if it's for short periods of time. When nohz is disabled or a machine is
> active enough that CPUs are not disabling the tick, select_idle_cpu may
> fail to select an idle CPU and instead stack tasks on the old CPU.
>
> The other subtlety is that select_idle_sibling() currently allows a
> SCHED_IDLE cpu to be used as a wakeup target. The CPU is not really
> idle as such, it's simply running a low priority task that is suitable
> for preemption. I suspect this patch breaks that.

Yes, good point. I completely missed this

>
> --
> Mel Gorman
> SUSE Labs