Re: [PATCH 03/10] sched,fair: redefine runnable_load_avg as the sum of task_h_load

From: Rik van Riel
Date: Mon Jul 01 2019 - 12:47:41 EST


On Mon, 2019-07-01 at 12:29 -0400, Josef Bacik wrote:
> On Fri, Jun 28, 2019 at 04:49:06PM -0400, Rik van Riel wrote:
> > The runnable_load magic is used to quickly propagate information
> > about
> > runnable tasks up the hierarchy of runqueues. The runnable_load_avg
> > is
> > mostly used for the load balancing code, which only examines the
> > value at
> > the root cfs_rq.
> >
> > Redefine the root cfs_rq runnable_load_avg to be the sum of
> > task_h_loads
> > of the runnable tasks. This works because the hierarchical
> > runnable_load of
> > a task is already equal to the task_se_h_load today. This provides
> > enough
> > information to the load balancer.
> >
> > The runnable_load_avg of the cgroup cfs_rqs does not appear to be
> > used for anything, so don't bother calculating those.
> >
> > This removes one of the things that the code currently traverses
> > the
> > cgroup hierarchy for, and getting rid of it brings us one step
> > closer
> > to a flat runqueue for the CPU controller.
> >
>
> My memory on this stuff is very hazy, but IIRC we had the
> runnable_sum and the
> runnable_avg separated out because you could have the avg lag behind
> the sum.
> So like you enqueue a bunch of new entities who's avg may have
> decayed a bunch
> and so their overall load is not felt on the CPU until they start
> running, and
> now you've overloaded that CPU. The sum was there to make sure new
> things
> coming onto the CPU added actual load to the queue instead of looking
> like there
> was no load.
>
> Is this going to be a problem now with this new code?

That is a good question!

On the one hand, you may well be right.

On the other hand, while I see the old code calculating
runnable_sum, I don't really see it _using_ it to drive
scheduling decisions.

It would be easy to define the CPU cfs_rq->runnable_load_sum
as being the sum of task_se_h_weight() of each runnable task
on the CPU (for example), but what would we use it for?

What am I missing?

> +static inline void
> > +update_runnable_load_avg(struct sched_entity *se)
> > +{
> > + struct cfs_rq *root_cfs_rq = &cfs_rq_of(se)->rq->cfs;
> > + long new_h_load, delta;
> > +
> > + SCHED_WARN_ON(!entity_is_task(se));
> > +
> > + if (!se->on_rq)
> > + return;
> >
> > - sub_positive(&cfs_rq->avg.runnable_load_avg, se-
> > >avg.runnable_load_avg);
> > - sub_positive(&cfs_rq->avg.runnable_load_sum,
> > - se_runnable(se) * se->avg.runnable_load_sum);
> > + new_h_load = task_se_h_load(se);
> > + delta = new_h_load - se->enqueued_h_load;
> > + root_cfs_rq->avg.runnable_load_avg += delta;
>
> Should we use add_positive here? Thanks,

Yes, we should use add_positive. I'll do that for v3.

--
All Rights Reversed.

Attachment: signature.asc
Description: This is a digitally signed message part