Re: [PATCH 19/24] sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE
From: Vincent Guittot
Date: Wed Aug 14 2024 - 08:59:18 EST
On Wed, 14 Aug 2024 at 00:18, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Aug 13, 2024 at 02:43:56PM +0200, Valentin Schneider wrote:
> > On 27/07/24 12:27, Peter Zijlstra wrote:
> > > Note that tasks that are kept on the runqueue to burn off negative
> > > lag, are not in fact runnable anymore, they'll get dequeued the moment
> > > they get picked.
> > >
> > > As such, don't count this time towards runnable.
> > >
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > > ---
> > > kernel/sched/fair.c | 2 ++
> > > kernel/sched/sched.h | 6 ++++++
> > > 2 files changed, 8 insertions(+)
> > >
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -5388,6 +5388,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
> > > if (cfs_rq->next == se)
> > > cfs_rq->next = NULL;
> > > se->sched_delayed = 1;
> > > + update_load_avg(cfs_rq, se, 0);
> >
> > Shouldn't this be before setting ->sched_delayed? accumulate_sum() should
> > see the time delta as spent being runnable.
> >
> > > return false;
> > > }
> > > }
> > > @@ -6814,6 +6815,7 @@ requeue_delayed_entity(struct sched_enti
> > > }
> > >
> > > se->sched_delayed = 0;
> > > + update_load_avg(cfs_rq, se, 0);
> >
> > Ditto on the ordering
>
> Bah, so I remember thinking about it and then I obviously go and do it
> the exact wrong way around eh? Let me double check this tomorrow morning
> with the brain slightly more awake :/
>
> > > }
> > >
> > > /*
> > > --- a/kernel/sched/sched.h
> > > +++ b/kernel/sched/sched.h
> > > @@ -816,6 +816,9 @@ static inline void se_update_runnable(st
> > >
> > > static inline long se_runnable(struct sched_entity *se)
> > > {
> > > + if (se->sched_delayed)
> > > + return false;
> > > +
> >
> > Per __update_load_avg_se(), delayed-dequeue entities are still ->on_rq, so
> > their load signal will increase. Do we want a similar helper for the @load
> > input of ___update_load_sum()?
>
> So the whole reason to keep then enqueued is so that they can continue
> to compete for vruntime, and vruntime is load based. So it would be very
> weird to remove them from load.
We only use the weight to update vruntime, not the load. The load is
used to balance tasks between cpus and if we keep a "delayed" dequeued
task in the load, we will artificially inflate the load_avg on this rq
Shouldn't we track separately the sum of the weight of delayed dequeue
to apply it only on vruntime update ?
>