Re: [PATCH 6/6 v2] sched/eevdf: Speedup short slice task scheduling

From: Vincent Guittot

Date: Tue Jun 16 2026 - 11:26:33 EST

On Tue, 16 Jun 2026 at 12:57, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Mon, Jun 15, 2026 at 06:24:20PM +0200, Vincent Guittot wrote:
> > When a task with a shorter slice is enqueued, we protect the running
> > task which has a longer slice until it becomes ineligible instead of a
> > full slice in order to speedup the switch to other tasks until the task
> > with the shortest slice is scheduled. This helps to the task to not wait
> > too many full slices before running.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 601c67cff185..994fcf3ea702 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1091,7 +1091,10 @@ static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity
> > slice = cfs_rq_min_slice(cfs_rq);
> >
> > slice = min(slice, se->slice);
> > - if (vruntime != se->vruntime || slice != se->slice)
> > +
> > + if (sched_feat(PREEMPT_SHORT) && slice < se->slice)
> > + vprot = avg_vruntime(cfs_rq);
> > + else if ((vruntime != se->vruntime) || (slice != se->slice))
> > vprot = min_vruntime(vprot, vruntime + calc_delta_fair(slice, se));
> >
> > se->vprot = vprot;
>
> I am not entirely sure I understand this one.
>
> avg_vruntime() could be ahead of se->deadline, esp for very short
> slices. This would then extend protection beyond the one slice..

Fair enough, I haven't checked that we were not extending the vprot
(will add it). I don't think this happens that often, particularly
because this only occurs when a task with a shorter slice is enqueued
waiting to run on the cpu and we expect the lag to be shorter than the
slice

>
> Aside from that, there are but two protect_slice() callers that matter:
>
> - pick_eevdf(): this already has a hard limit on avg_vruntime()
>
> - update_curr(): this will trigger preemption when reaching either
> ->deadline or ->vprot.
>
>
> Also, the purpose of vprot is similar to the old min_gran, ensure any
> task gets *some* time and avoid the degenerate case of endlessly
> scheduling without 'any' real progress.
>
> For EEVDF this happens when tasks get arbitrarily close to
> avg_vruntime(). Eg, you have the two tasks A,B with A a virtual ns
> before avg (and per necessity the other 1 ns after). You run A until its
> just past B, find its not longer eligible, switch to B and do the same.
> This then results in max frequency context switches and minimal actual
> progress.
>
> The thing that was supposed to stop this is vprot, but if you
> consistently set vprot at avg_vruntime, this is effectively disabling
> vprot. No?

Yes, that's why it only happens when a shorter slice task is enqueued.
Other tasks that will run before, should have a lag around their slice
when this happens
Note that I'm not using sched hrtick so once picked se will run for a
tick (unless another wakeup happen)

>
> Now, the conditions for this are such that this only happens for all
> tasks not of the minimal slice length in the tree. So in order words,
> you get spikes of high frequency scheduling just to burn vtime in order
> to achieve eligibility for the earliest min_slice task, right?

Not sure what you mean by high frequency scheduling but each task
should run once and just long enough to become ineligible or eligible
but after the short slice task because of deadline update

>
> So what you really want is not avg_vruntime() but the actual
> se->vruntime of this earliest min_slice entity. Then we can simply run
> whatever task and not get hit with high frequency scheduling, and still
> achieve minimal latency for the waiting task.
>
> Now, we don't actually have a convenient way to get this specific task,
> but would something like so work?
>
> if (sched_feat(PREEMPT_SHORT) && slice != se->slice)
> vprot = min_vruntime(vprot, __pick_root_entity(cfs_rq)->vruntime);
>
> That is, we protect until the next earliest task becomes eligible.

I probably need to think a bit more about this but if you have several
tasks eligible with very close vruntime, will not this make even
smaller running step because __pick_root_entity(cfs_rq)->vruntime will
be earlier than avg_vruntime().

>
>
> Or did I go off the rails somewhere?