Re: [PATCH RFC] sched/fair: Avoid unnecessary IPIs for ILB
From: Joel Fernandes
Date: Fri Oct 06 2023 - 12:47:25 EST
Hi Vincent,
On Fri, Oct 06, 2023 at 03:46:44PM +0200, Vincent Guittot wrote:
[...]
> > ---
> > kernel/sched/fair.c | 21 ++++++++++++++-------
> > 1 file changed, 14 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index cb225921bbca..2ece55f32782 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -11786,13 +11786,12 @@ void nohz_balance_enter_idle(int cpu)
> > /*
> > * Ensures that if nohz_idle_balance() fails to observe our
> > * @idle_cpus_mask store, it must observe the @has_blocked
> > - * and @needs_update stores.
> > + * stores.
> > */
> > smp_mb__after_atomic();
> >
> > set_cpu_sd_state_idle(cpu);
> >
> > - WRITE_ONCE(nohz.needs_update, 1);
>
> the set of needs_updat here is the main way to set nohz.needs_update
> and trigger an update of next_balance so it would be good to explain
> why we still need to keep it instead r removing it entirely
Ok, we are thinking of getting rid of it as Ingo suggested.
> > out:
> > /*
> > * Each time a cpu enter idle, we assume that it has blocked load and
> > @@ -11945,21 +11944,25 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> > }
> >
> > /*
> > - * Check if we need to run the ILB for updating blocked load before entering
> > - * idle state.
> > + * Check if we need to run the ILB for updating blocked load and/or updating
> > + * nohz.next_balance before entering idle state.
> > */
> > void nohz_run_idle_balance(int cpu)
> > {
> > unsigned int flags;
> >
> > - flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK, nohz_flags(cpu));
> > + flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK | NOHZ_NEXT_KICK, nohz_flags(cpu));
> > +
> > + if (!flags)
> > + return;
> >
> > /*
> > * Update the blocked load only if no SCHED_SOFTIRQ is about to happen
> > * (ie NOHZ_STATS_KICK set) and will do the same.
> > */
> > - if ((flags == NOHZ_NEWILB_KICK) && !need_resched())
> > - _nohz_idle_balance(cpu_rq(cpu), NOHZ_STATS_KICK);
> > + if ((flags == (flags & (NOHZ_NEXT_KICK | NOHZ_NEWILB_KICK))) &&
> > + !need_resched())
> > + _nohz_idle_balance(cpu_rq(cpu), flags);
>
> This breaks the update of the blocked load.
> nohz_newidle_balance() only sets NOHZ_NEWILB_KICK (and not
> NOHZ_STATS_KICK) when it wants to update blocked load before going
> idle but it then sets NOHZ_STATS_KICK when calling_nohz_idle_balance()
> . We only clear NOHZ_NEWILB_KICK when fetching flags but we test if
> other bits have been set by a pending softirq which will do the same.
Yes, we realized this just after sending the RFC. Will fix, thank you!
> That's why we use NOHZ_NEWILB_KICK and not NOHZ_STATS_KICK directly.
> Similarly you can't directly use NOHZ_NEXT_KICK.
This is not an issue actually, as NEXT_KICK is only set by this path
(nohz_newidle_balance()) after this patch. The NEWILB_KICK and STATS_KICK
case is different where it can be updated in more than 1 path. Right?
> Also note that _nohz_idle_balance() is not robust against parallel run
True, though the parallelism is already happening. However your point is well
taken and we'll try to improve the existing code to make it more robust as
well (if needed). Will dig deeper into it.
thanks,
- Joel & Vineeth