Re: [PATCH 1/5] sched/fair: ignore SIS_UTIL when has idle core

From: Chen Yu
Date: Wed Sep 07 2022 - 03:28:35 EST


Hi Mel,
On 2022-09-06 at 10:57:17 +0100, Mel Gorman wrote:
> On Mon, Sep 05, 2022 at 10:40:00PM +0800, Abel Wu wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 6089251a4720..59b27a2ef465 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -6427,21 +6427,36 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > > if (sd_share) {
> > > /* because !--nr is the condition to stop scan */
> > > nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
> > > - /* overloaded LLC is unlikely to have idle cpu/core */
> > > - if (nr == 1)
> > > - return -1;
> > > +
> > > + /*
> > > + * Non-overloaded case: Scan full domain if there is
> > > + * an idle core. Otherwise, scan for an idle
> > > + * CPU based on nr_idle_scan
> > > + * Overloaded case: Unlikely to have an idle CPU but
> > > + * conduct a limited scan if there is potentially
> > > + * an idle core.
> > > + */
> > > + if (nr > 1) {
> > > + if (has_idle_core)
> > > + nr = sd->span_weight;
> > > + } else {
> > > + if (!has_idle_core)
> > > + return -1;
> > > + nr = 2;
> > > + }
> > > }
> > > }
> > > for_each_cpu_wrap(cpu, cpus, target + 1) {
> > > + if (!--nr)
> > > + break;
> > > +
> > > if (has_idle_core) {
> > > i = select_idle_core(p, cpu, cpus, &idle_cpu);
> > > if ((unsigned int)i < nr_cpumask_bits)
> > > return i;
> > > } else {
> > > - if (!--nr)
> > > - return -1;
> > > idle_cpu = __select_idle_cpu(cpu, p);
> > > if ((unsigned int)idle_cpu < nr_cpumask_bits)
> > > break;
> >
> > I spent last few days testing this, with 3 variations (assume
> > has_idle_core):
> >
> > a) full or limited (2cores) scan when !nr_idle_scan
> > b) whether clear sds->has_idle_core when partial scan failed
> > c) scale scan depth with load or not
> >
> > some observations:
> >
> > 1) It seems always bad if not clear sds->has_idle_core when
> > partial scan fails. It is due to over partially scanned
> > but still can not find an idle core. (Following ones are
> > based on clearing has_idle_core even in partial scans.)
> >
>
> Ok, that's rational. There will be corner cases where there was no idle
> CPU near the target when there is an idle core far away but it would be
> counter to the purpose of SIS_UTIL to care about that corner case.
>
> > 2) Unconditionally full scan when has_idle_core is not good
> > for netperf_{udp,tcp} and tbench4. It is probably because
> > the SIS success rate of these workloads is already high
> > enough (netperf ~= 100%, tbench4 ~= 50%, compared to that
> > hackbench ~= 3.5%) which negate a lot of the benefit full
> > scan brings.
> >
>
> That's also rational. For a single client/server on netperf, it's expected
> that the SIS success rate is high and scanning is minimal. As the client
> and server are sharing data on localhost and somewhat synchronous, it may
> even partially benefit from SMT sharing.
>
Maybe off-topic, since we monitor the success rate and also other metrics
for each optimization in SIS path, is it possible to merge your statistics
patch [1] into upstream so we don't need to rebase in the future(although
it is targeting kernel development)?

Link: https://lore.kernel.org/lkml/20210726102247.21437-2-mgorman@xxxxxxxxxxxxxxxxxxx/
[cut]


thanks,
Chenyu