Re: [PATCH 03/10] sched,fair: redefine runnable_load_avg as the sum of task_h_load
From: Rik van Riel
Date: Mon Jul 01 2019 - 15:32:51 EST
On Mon, 2019-07-01 at 15:22 -0400, Josef Bacik wrote:
> On Mon, Jul 01, 2019 at 12:47:35PM -0400, Rik van Riel wrote:
> > On Mon, 2019-07-01 at 12:29 -0400, Josef Bacik wrote:
> > >
> > > My memory on this stuff is very hazy, but IIRC we had the
> > > runnable_sum and the
> > > runnable_avg separated out because you could have the avg lag
> > > behind
> > > the sum.
> > > So like you enqueue a bunch of new entities who's avg may have
> > > decayed a bunch
> > > and so their overall load is not felt on the CPU until they start
> > > running, and
> > > now you've overloaded that CPU. The sum was there to make sure
> > > new
> > > things
> > > coming onto the CPU added actual load to the queue instead of
> > > looking
> > > like there
> > > was no load.
> > >
> > > Is this going to be a problem now with this new code?
> >
> > That is a good question!
> >
> > On the one hand, you may well be right.
> >
> > On the other hand, while I see the old code calculating
> > runnable_sum, I don't really see it _using_ it to drive
> > scheduling decisions.
> >
> > It would be easy to define the CPU cfs_rq->runnable_load_sum
> > as being the sum of task_se_h_weight() of each runnable task
> > on the CPU (for example), but what would we use it for?
> >
> > What am I missing?
>
> It's suuuuuper sublte, but you're right in that we don't really need
> the
> runnable_avg per-se, but what you do is you kill calc_group_runnable,
> which used
> to do this
>
> load_avg = max(cfs_rq->avg.load_avg,
> scale_load_down(cfs_rq->load.weight));
>
> runnable = max(cfs_rq->avg.runnable_load_avg,
> scale_load_down(cfs_rq->runnable_weight));
>
> so we'd account for this weirdness of adding a previously idle
> process to a new
> CPU and overloading the CPU because we'd add a bunch of these 0
> weight looking
> tasks that suddenly all wake up and are on the same CPU. So we used
> the
> runnable_weight to account for what was actually happening, and the
> max of
> load_avg and the weight to figure out what the potential load would
> be.
>
> What you've done here is change the weighting stuff to be completely
> based on
> load avg, which is problematic for the reasons above. Did you fix
> this later on
> in your patches? If so then just tell me to keep reading and I'll do
> that ;).
No, I have not fixed that later in the code.
I am not entirely sure how I could do that, without
reintroducing walking the hierarchy at task enqueue
and dequeue, but maybe you have some idea...
--
All Rights Reversed.
Attachment:
signature.asc
Description: This is a digitally signed message part