Re: [PATCH 1/5] sched/fair: ignore SIS_UTIL when has idle core

From: Mel Gorman
Date: Tue Sep 06 2022 - 06:10:54 EST


On Mon, Sep 05, 2022 at 10:40:00PM +0800, Abel Wu wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 6089251a4720..59b27a2ef465 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6427,21 +6427,36 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > if (sd_share) {
> > /* because !--nr is the condition to stop scan */
> > nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
> > - /* overloaded LLC is unlikely to have idle cpu/core */
> > - if (nr == 1)
> > - return -1;
> > +
> > + /*
> > + * Non-overloaded case: Scan full domain if there is
> > + * an idle core. Otherwise, scan for an idle
> > + * CPU based on nr_idle_scan
> > + * Overloaded case: Unlikely to have an idle CPU but
> > + * conduct a limited scan if there is potentially
> > + * an idle core.
> > + */
> > + if (nr > 1) {
> > + if (has_idle_core)
> > + nr = sd->span_weight;
> > + } else {
> > + if (!has_idle_core)
> > + return -1;
> > + nr = 2;
> > + }
> > }
> > }
> > for_each_cpu_wrap(cpu, cpus, target + 1) {
> > + if (!--nr)
> > + break;
> > +
> > if (has_idle_core) {
> > i = select_idle_core(p, cpu, cpus, &idle_cpu);
> > if ((unsigned int)i < nr_cpumask_bits)
> > return i;
> > } else {
> > - if (!--nr)
> > - return -1;
> > idle_cpu = __select_idle_cpu(cpu, p);
> > if ((unsigned int)idle_cpu < nr_cpumask_bits)
> > break;
>
> I spent last few days testing this, with 3 variations (assume
> has_idle_core):
>
> a) full or limited (2cores) scan when !nr_idle_scan
> b) whether clear sds->has_idle_core when partial scan failed
> c) scale scan depth with load or not
>
> some observations:
>
> 1) It seems always bad if not clear sds->has_idle_core when
> partial scan fails. It is due to over partially scanned
> but still can not find an idle core. (Following ones are
> based on clearing has_idle_core even in partial scans.)
>

Ok, that's rational. There will be corner cases where there was no idle
CPU near the target when there is an idle core far away but it would be
counter to the purpose of SIS_UTIL to care about that corner case.

> 2) Unconditionally full scan when has_idle_core is not good
> for netperf_{udp,tcp} and tbench4. It is probably because
> the SIS success rate of these workloads is already high
> enough (netperf ~= 100%, tbench4 ~= 50%, compared to that
> hackbench ~= 3.5%) which negate a lot of the benefit full
> scan brings.
>

That's also rational. For a single client/server on netperf, it's expected
that the SIS success rate is high and scanning is minimal. As the client
and server are sharing data on localhost and somewhat synchronous, it may
even partially benefit from SMT sharing.

So basic approach would be "always clear sds->has_idle_core" + "limit
scan even when has_idle_core is true", right?

If so, stick a changelog on it and resend!

> 3) Scaling scan depth with load seems good for the hackbench
> socket tests, and neutral in pipe tests. And I think this
> is just the case you mentioned before, under fast wake-up
> workloads the has_idle_core will become not that reliable,
> so a full scan won't always win.
>

My recommendation is leave this out for now but assuming the rest of
the patches get picked up, consider posting it for the next major kernel
version (i.e. separate the basic and clever approaches by one major kernel
version). By separating them, there is less chance of a false positive
bisection pointing to the wrong patch. Any regression will not be perfectly
reproducible so the changes of a false positive bisection is relatively high.

--
Mel Gorman
SUSE Labs