Re: [PATCH] fix scheduler regression from "sched/fair: Rework load_balance()"

From: Rik van Riel
Date: Thu Oct 29 2020 - 22:11:12 EST


On Mon, 2020-10-26 at 17:52 +0100, Vincent Guittot wrote:
> On Mon, 26 Oct 2020 at 17:48, Chris Mason <clm@xxxxxx> wrote:
> > On 26 Oct 2020, at 12:20, Vincent Guittot wrote:
> >
> > > what you are suggesting is something like:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 4978964e75e5..3b6fbf33abc2 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -9156,7 +9156,8 @@ static inline void
> > > calculate_imbalance(struct
> > > lb_env *env, struct sd_lb_stats *s
> > > * emptying busiest.
> > > */
> > > if (local->group_type == group_has_spare) {
> > > - if (busiest->group_type > group_fully_busy) {
> > > + if ((busiest->group_type > group_fully_busy) &&
> > > + !(env->sd->flags & SD_SHARE_PKG_RESOURCES)) {
> > > /*
> > > * If busiest is overloaded, try to fill
> > > spare
> > > * capacity. This might end up creating
> > > spare
> > > capacity
> > >
> > > which also fixes the problem for me and alignes LB with wakeup
> > > path
> > > regarding the migration
> > > in the LLC
> >
> > Vincent’s patch on top of 5.10-rc1 looks pretty great:
> >
> > Latency percentiles (usec) runtime 90 (s) (3320 total samples)
> > 50.0th: 161 (1687 samples)
> > 75.0th: 200 (817 samples)
> > 90.0th: 228 (488 samples)
> > 95.0th: 254 (164 samples)
> > *99.0th: 314 (131 samples)
> > 99.5th: 330 (17 samples)
> > 99.9th: 356 (13 samples)
> > min=29, max=358
> >
> > Next we test in prod, which probably won’t have answers until
> > tomorrow. Thanks again Vincent!
>
> Great !
>
> I'm going to run more tests on my setup as well to make sure that it
> doesn't generate unexpected side effects on other kinds of use cases.

We have tested the patch with several pretty demanding
workloads for the past several days, and it seems to
do the trick!

With all the current scheduler code from the Linus tree,
plus this patch on top, performance is as good as it ever
was before with one workload, and slightly better with
the other.

--
All Rights Reversed.

Attachment: signature.asc
Description: This is a digitally signed message part