Re: [PATCH] Revert "sched/fair: Fix O(nr_cgroups) in the load balancing path"
From: Vincent Guittot
Date: Tue Oct 29 2019 - 13:09:49 EST
On Tue, 29 Oct 2019 at 18:00, Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
>
> On Tue, 29 Oct 2019 at 17:50, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, Oct 29, 2019 at 05:20:56PM +0100, Vincent Guittot wrote:
> > > On Tue, 29 Oct 2019 at 16:36, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Oct 29, 2019 at 07:55:26AM -0700, Doug Smythies wrote:
> > > >
> > > > > I only know that the call to the intel_pstate driver doesn't
> > > > > happen, and that it is because cfs_rq_is_decayed returns TRUE.
> > > > > So, I am asserting that the request is not actually decayed, and
> > > > > should not have been deleted.
> > > >
> > > > So what cfs_rq_is_decayed() does is allow a cgroup's cfs_rq to be
> > > > removed from the list.
> > > >
> > > > Once it is removed, that cfs_rq will no longer be checked in the
> > > > update_blocked_averages() loop. Which means done has less chance of
> > > > getting false. Which in turn means that it's more likely
> > > > rq->has_blocked_load becomes 0.
> > > >
> > > > Which all sounds good.
> > > >
> > > > Can you please trace what keeps the CPU awake?
> > >
> > > I think that the sequence below is what intel pstate driver was using
> > >
> > > rt/dl task wakes up and run for some times
> > > rt/dl pelt signal is no more null so periodic decay happens.
> > >
> > > before optimization update_cfs_rq_load_avg() for root cfs_rq was
> > > called even if pelt was null,
> > > which calls cfs_rq_util_change, which calls intel pstate
> > >
> > > after optimization its no more called.
> >
> > Not calling cfs_rq_util_change() when it doesn't change, seems like the
> > right thing. Why would intel_pstate want it called when it doesn't
> > change?
>
> Yes I agree
>
> My original thought was that either irq/rt ordl pelt signals was used
> to set frequency and it needs to be called to decrease this freq while
> pelt signals was decaying but it doesn't seem to use it but only needs
> to be called from time to time
Apart from Doug's problem, we have 2 possible problems with the
current update_blocked_averages()
1- irq, dl and rt are updated after cfs but it is the cfs update that
will call schedutil for updating the frequency which means that this
is done with old irq/rt/dl value. we should change the order and start
with irq/rt and dl
2- when cfs is null but not irq/rt or dl, we decay the values but we
never call schedutil to update the freq accordingly. The impact is
probably minimal because only irq and timer can really run without
call schedutil to update frequency but this can happen.
I'm going to prepare some patches