Re: [PATCH] sched: allow newidle balancing to bail out of load_balance

From: Vincent Guittot
Date: Fri Jun 10 2022 - 04:10:45 EST


On Thu, 9 Jun 2022 at 21:40, Josh Don <joshdon@xxxxxxxxxx> wrote:
>
> Thanks Vincent,
>
> On Thu, Jun 9, 2022 at 6:42 AM Vincent Guittot
> <vincent.guittot@xxxxxxxxxx> wrote:
> >
> > On Thu, 9 Jun 2022 at 04:55, Josh Don <joshdon@xxxxxxxxxx> wrote:
> > >
> > > While doing newidle load balancing, it is possible for new tasks to
> > > arrive, such as with pending wakeups. newidle_balance() already accounts
> > > for this by exiting the sched_domain load_balance() iteration if it
> > > detects these cases. This is very important for minimizing wakeup
> > > latency.
> > >
> > > However, if we are already in load_balance(), we may stay there for a
> > > while before returning back to newidle_balance(). This is most
> > > exacerbated if we enter a 'goto redo' loop in the LBF_ALL_PINNED case. A
> > > very straightforward workaround to this is to adjust should_we_balance()
> > > to bail out if we're doing a CPU_NEWLY_IDLE balance and new tasks are
> > > detected.
> >
> > This one is close to the other tests and I wonder if it should be
> > better placed before taking the busiest rq lock and detaching some
> > tasks.
> >
> > Beside your use case where all other threads can't move in local cpu
> > and load_balance() loops and clears other cpus, most of the time is
> > probably spent in fbg() and fbq() so there are more chance that a task
> > woke in this meantime and I imagine that it becomes useless to take
> > lock and move tasks from another cpu if the local cpu is no more newly
> > idle.
> >
> > Have you tried other places in load_balance() and does this one
> > provide the lowest wakeup latency ?
> >
> > That being said, the current patch makes sense.
>
> I tested with another check after fbg/fbq and there wasn't any
> noticeable improvement to observed wakeup latency (not totally
> unexpected, since it only helps for wakeups that come during fbg/fbq).

ok. so IIUC the wakeup has already happened when we start
load_balance() in your case so the additional test is useless in your
case

> However, I don't think there's any harm in having that extra check in
> the CPU_NEWLY_IDLE case; might as well avoid bouncing the rq lock if
> we can. fbq+fbg are together taking ~3-4us per iteration in my repro.
>
> If there are no objections I can send a v2 with the added delta:

Would be good to get figures that show some benefits of this
additional check for some benchmarks

So I think that we can stay with your current proposal for now

>
> @@ -9906,6 +9906,16 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> goto out_balanced;
> }
>
> + /*
> + * fbg/fbq can take a while. In the newly idle case, recheck whether
> + * we should continue with balancing, since it is possible that a
> + * task woke up in the interim.
> + */
> + if (env.idle == CPU_NEWLY_IDLE && !should_we_balance(&env)) {
> + *continue_balancing = 0;
> + goto out_balanced;
> + }
> +
> BUG_ON(busiest == env.dst_rq);
>
> schedstat_add(sd->lb_imbalance[idle], env.imbalance);