Re: [RFC PATCH v2 1/5] sched/fair: Add ancestors of unthrottled undecayed cfs_rq

From: Vincent Guittot
Date: Thu Sep 09 2021 - 10:39:34 EST


On Thu, 19 Aug 2021 at 19:50, Michal Koutný <mkoutny@xxxxxxxx> wrote:
>
> Since commit a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to
> list on unthrottle") we add cfs_rqs with no runnable tasks but not fully
> decayed into the load (leaf) list. We may ignore adding some ancestors
> and therefore breaking tmp_alone_branch invariant. This broke LTP test
> cfs_bandwidth01 and it was partially fixed in commit fdaba61ef8a2
> ("sched/fair: Ensure that the CFS parent is added after unthrottling").
>
> I noticed the named test still fails even with the fix (but with low
> probability, 1 in ~1000 executions of the test). The reason is when
> bailing out of unthrottle_cfs_rq early, we may miss adding ancestors of
> the unthrottled cfs_rq, thus, not joining tmp_alone_branch properly.
>
> Fix this by adding ancestors if we notice the unthrottled cfs_rq was
> added to the load list.
>
> Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
> Signed-off-by: Michal Koutný <mkoutny@xxxxxxxx>
> ---
> kernel/sched/fair.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 44c452072a1b..2c41a9007928 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4898,8 +4898,16 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> /* update hierarchical throttle state */
> walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq);
>
> - if (!cfs_rq->load.weight)
> + if (!cfs_rq->load.weight) {
> + /* Nothing to run but something to decay? Complete the branch */
> + if (cfs_rq->on_list)

Could you use !cfs_rq_is decayed(cfs_rq) ?

> + for_each_sched_entity(se) {
> + if (list_add_leaf_cfs_rq(group_cfs_rq(se)))
> + break;
> + }
> + assert_list_leaf_cfs_rq(rq);

Instead of adding a loop here you should better jump to unthrottle_throttle ?

> return;
> + }
>
> task_delta = cfs_rq->h_nr_running;
> idle_task_delta = cfs_rq->idle_h_nr_running;
> --
> 2.32.0
>