Re: [PATCH v2] sched/fair: Correctly insert cfs_rq's to list on unthrottle

From: Vincent Guittot
Date: Thu Jun 03 2021 - 09:41:06 EST


On Thu, 3 Jun 2021 at 15:13, Odin Ugedal <odin@xxxxxxx> wrote:
>
> Hi,
>
> > Out of curiosity, why did you decide to use
> > cfs_rq->tg_load_avg_contrib instead of !cfs_rq_is_decayed(cfs_rq)
> > which is used to delete the cfs_rq from the list when updating blocked
> > load ?
>
> Well, the main reason was that it is currently (without the other in
> flight patches) not safe to just use "cfs_rq_is_decayed" directly,
> since that could result in
> a situation where tg_load_avg_contrib!=0 while
> cfs_rq_is_decayed()==true. I guess we can use cfs_rq_is_decayed() if
> you prefer that,
> and all the other PELT patches are merged. (This was initially why I
> thought a new field was a simpler and more elegant solution to make
> sure we book-keep correctly,
> but when the PELT stuff is fixed properly, that should be no real
> issue as long it works as we expect).

If it's only a matter of waiting other PELT patches to be merged, we
should use cfs_rq_is_decayed().

cfs_rq->tg_load_avg_contrib is used to reduce contention on
tg->load_avg but it is outside the scope of PELT and the update of
blocked load so we should avoid using it there

>
> I was also thinking about the cfs_rq->nr_running part; is there a
> chance of a situation where a cfs_rq->nr_running==1 and it has no
> load, resulting in it being decayed and
> removed from the list in "__update_blocked_fair"? I have not looked
> properly at it, but just wondering if that is actually possible..
>
>
> Also, out of curiosity, are there some implications of a situation
> where tg_load_avg_contrib=0 while *_load!=0, or would that not cause

do you mean all cfs_rq->avg.load_avg with *_load ?

> fairness issues?

if load_avg!=0, we will update it periodically and sync
tg_load_avg_contrib with the former. So it's not a problem.

The other way was a problem because we stop updating load_avg and
tg_load_avg_contrib when load_avg/load_sum is null so the
tg_load_avg_contrib is stalled with a possibly very old value


>
> Thanks
> Odin