Re: [PATCH 14/17] sched/eevdf: Better handle mixed slice length

From: Peter Zijlstra
Date: Tue Apr 04 2023 - 05:30:44 EST


On Fri, Mar 31, 2023 at 05:26:51PM +0200, Vincent Guittot wrote:
> On Tue, 28 Mar 2023 at 13:06, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > In the case where (due to latency-nice) there are different request
> > sizes in the tree, the smaller requests tend to be dominated by the
> > larger. Also note how the EEVDF lag limits are based on r_max.
> >
> > Therefore; add a heuristic that for the mixed request size case, moves
> > smaller requests to placement strategy #2 which ensures they're
> > immidiately eligible and and due to their smaller (virtual) deadline
> > will cause preemption.
> >
> > NOTE: this relies on update_entity_lag() to impose lag limits above
> > a single slice.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 14 ++++++++++++++
> > kernel/sched/features.h | 1 +
> > kernel/sched/sched.h | 1 +
> > 3 files changed, 16 insertions(+)
> >
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -616,6 +616,7 @@ avg_vruntime_add(struct cfs_rq *cfs_rq,
> > s64 key = entity_key(cfs_rq, se);
> >
> > cfs_rq->avg_vruntime += key * weight;
> > + cfs_rq->avg_slice += se->slice * weight;
> > cfs_rq->avg_load += weight;
> > }
> >
> > @@ -626,6 +627,7 @@ avg_vruntime_sub(struct cfs_rq *cfs_rq,
> > s64 key = entity_key(cfs_rq, se);
> >
> > cfs_rq->avg_vruntime -= key * weight;
> > + cfs_rq->avg_slice -= se->slice * weight;
> > cfs_rq->avg_load -= weight;
> > }
> >
> > @@ -4832,6 +4834,18 @@ place_entity(struct cfs_rq *cfs_rq, stru
> > lag = se->vlag;
> >
> > /*
> > + * For latency sensitive tasks; those that have a shorter than
> > + * average slice and do not fully consume the slice, transition
> > + * to EEVDF placement strategy #2.
> > + */
> > + if (sched_feat(PLACE_FUDGE) &&
> > + cfs_rq->avg_slice > se->slice * cfs_rq->avg_load) {
> > + lag += vslice;
> > + if (lag > 0)
> > + lag = 0;
>
> By using different lag policies for tasks, doesn't this create
> unfairness between tasks ?

Possibly, I've just not managed to trigger it yet -- if it is an issue
it can be fixed by ensuring we don't place the entity before its
previous vruntime just like the sleeper hack later on.

> I wanted to stress this situation with a simple use case but it seems
> that even without changing the slice, there is a fairness problem:
>
> Task A always run
> Task B loops on : running 1ms then sleeping 1ms
> default nice and latency nice prio bot both
> each task should get around 50% of the time.
>
> The fairness is ok with tip/sched/core
> but with eevdf, Task B only gets around 30%
>
> I haven't identified the problem so far

Heh, this is actually the correct behaviour. If you have a u=1 and a
u=.5 task, you should distribute time on a 2:1 basis, eg. 67% vs 33%.

CFS has this sleeper bonus hack that makes it 50% vs 50% but that is
strictly not correct -- although it does help a number of weird
benchmarks.