Re: [PATCH v2 1/5] sched/fair: Reorder enqueue/dequeue_task_fair path

From: Vincent Guittot
Date: Wed Feb 19 2020 - 11:26:27 EST


On Wed, 19 Feb 2020 at 12:07, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>
> On 18/02/2020 15:15, Vincent Guittot wrote:
> > On Tue, 18 Feb 2020 at 14:22, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >>
> >> On Tue, Feb 18, 2020 at 01:37:37PM +0100, Dietmar Eggemann wrote:
> >>> On 14/02/2020 16:27, Vincent Guittot wrote:
> >>>> The walk through the cgroup hierarchy during the enqueue/dequeue of a task
> >>>> is split in 2 distinct parts for throttled cfs_rq without any added value
> >>>> but making code less readable.
> >>>>
> >>>> Change the code ordering such that everything related to a cfs_rq
> >>>> (throttled or not) will be done in the same loop.
> >>>>
> >>>> In addition, the same steps ordering is used when updating a cfs_rq:
> >>>> - update_load_avg
> >>>> - update_cfs_group
> >>>> - update *h_nr_running
> >>>
> >>> Is this code change really necessary? You pay with two extra goto's. We
> >>> still have the two for_each_sched_entity(se)'s because of 'if
> >>> (se->on_rq); break;'.
> >>
> >> IIRC he relies on the presented ordering in patch #5 -- adding the
> >> running_avg metric.
> >
> > Yes, that's the main reason, updating load_avg before h_nr_running
>
> My hunch is you refer to the new function:
>
> static inline void se_update_runnable(struct sched_entity *se)
> {
> if (!entity_is_task(se))
> se->runnable_weight = se->my_q->h_nr_running;
> }
>
> I don't see the dependency to the 'update_load_avg -> h_nr_running'
> order since it operates on se->my_q, not cfs_rq = cfs_rq_of(se), i.e.
> se->cfs_rq.
>
> What do I miss here?

update_load_avg() updates both se and cfs_rq so if you update
cfs_rq->h_nr_running before calling update_load_avg() like in the 2nd
for_each_sched_entity, you will update cfs_rq runnable_avg for the
past time slot with the new h_nr_running value instead of the previous
value.