Re: [PATCH] sched/fair: fix runnable_avg for throttled cfs

From: Vincent Guittot
Date: Thu Feb 27 2020 - 08:11:02 EST


On Thu, 27 Feb 2020 at 12:20, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>
> On 26.02.20 21:01, Vincent Guittot wrote:
> > On Wed, 26 Feb 2020 at 20:04, <bsegall@xxxxxxxxxx> wrote:
> >>
> >> Vincent Guittot <vincent.guittot@xxxxxxxxxx> writes:
> >>
> >>> When a cfs_rq is throttled, its group entity is dequeued and its running
> >>> tasks are removed. We must update runnable_avg with current h_nr_running
> >>> and update group_se->runnable_weight with new h_nr_running at each level
>
> ^^^
>
> Shouldn't this be 'current' rather 'new' h_nr_running for
> group_se->runnable_weight? IMHO, you want to cache the current value
> before you add/subtract task_delta.

hmm... it can't be current in both places. In my explanation,
"current" means the current situation when we started to throttle cfs
and "new" means the new situation after we finished to throttle the
cfs. I should probably use old and new to prevent any
misunderstanding.

That being said, we need to update runnable_avg with the old
h_nr_running: the one before we started to throttle the cfs which is
the value saved in group_se->runnable_weight. Once we have updated
runnable_avg, we save the new h_nr_running in
group_se->runnable_weight that will be used for next updates.

>
> >>> of the hierarchy.
> >>
> >> You'll also need to do this for task enqueue/dequeue inside of a
> >> throttled hierarchy, I'm pretty sure.
> >
> > AFAICT, this is already done with patch "sched/pelt: Add a new
> > runnable average signal" when task is enqueued/dequeued inside a
> > throttled hierarchy
> >
> >>
> >>>
> >>> Fixes: 9f68395333ad ("sched/pelt: Add a new runnable average signal")
> >>> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> >>> ---
> >>> This patch applies on top of tip/sched/core
> >>>
> >>> kernel/sched/fair.c | 10 ++++++++++
> >>> 1 file changed, 10 insertions(+)
> >>>
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> index fcc968669aea..6d46974e9be7 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>> @@ -4703,6 +4703,11 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
> >>>
> >>> if (dequeue)
> >>> dequeue_entity(qcfs_rq, se, DEQUEUE_SLEEP);
> >>> + else {
> >>> + update_load_avg(qcfs_rq, se, 0);
> >>> + se_update_runnable(se);
> >>> + }
> >>> +
> >>> qcfs_rq->h_nr_running -= task_delta;
> >>> qcfs_rq->idle_h_nr_running -= idle_task_delta;
> >>>
> >>> @@ -4772,6 +4777,11 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> >>> cfs_rq = cfs_rq_of(se);
> >>> if (enqueue)
> >>> enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
> >>> + else {
> >>> + update_load_avg(cfs_rq, se, 0);
> >>
> >>
> >>> + se_update_runnable(se);
> >>> + }
> >>> +
> >>> cfs_rq->h_nr_running += task_delta;
> >>> cfs_rq->idle_h_nr_running += idle_task_delta;