Re: [PATCH 6/6 v2] sched/eevdf: Speedup short slice task scheduling

From: Vincent Guittot

Date: Wed Jun 17 2026 - 12:06:04 EST

On Tue, 16 Jun 2026 at 17:18, Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
>
> On Tue, 16 Jun 2026 at 12:57, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Mon, Jun 15, 2026 at 06:24:20PM +0200, Vincent Guittot wrote:
> > > When a task with a shorter slice is enqueued, we protect the running
> > > task which has a longer slice until it becomes ineligible instead of a
> > > full slice in order to speedup the switch to other tasks until the task
> > > with the shortest slice is scheduled. This helps to the task to not wait
> > > too many full slices before running.
> > >
> > > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > > ---
> > > kernel/sched/fair.c | 5 ++++-
> > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 601c67cff185..994fcf3ea702 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -1091,7 +1091,10 @@ static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity
> > > slice = cfs_rq_min_slice(cfs_rq);
> > >
> > > slice = min(slice, se->slice);
> > > - if (vruntime != se->vruntime || slice != se->slice)
> > > +
> > > + if (sched_feat(PREEMPT_SHORT) && slice < se->slice)
> > > + vprot = avg_vruntime(cfs_rq);
> > > + else if ((vruntime != se->vruntime) || (slice != se->slice))
> > > vprot = min_vruntime(vprot, vruntime + calc_delta_fair(slice, se));
> > >
> > > se->vprot = vprot;
> >
> > I am not entirely sure I understand this one.
> >
> > avg_vruntime() could be ahead of se->deadline, esp for very short
> > slices. This would then extend protection beyond the one slice..
>
> Fair enough, I haven't checked that we were not extending the vprot
> (will add it). I don't think this happens that often, particularly
> because this only occurs when a task with a shorter slice is enqueued
> waiting to run on the cpu and we expect the lag to be shorter than the
> slice
>
> >
> > Aside from that, there are but two protect_slice() callers that matter:
> >
> > - pick_eevdf(): this already has a hard limit on avg_vruntime()
> >
> > - update_curr(): this will trigger preemption when reaching either
> > ->deadline or ->vprot.
> >
> >
> > Also, the purpose of vprot is similar to the old min_gran, ensure any
> > task gets *some* time and avoid the degenerate case of endlessly
> > scheduling without 'any' real progress.
> >
> > For EEVDF this happens when tasks get arbitrarily close to
> > avg_vruntime(). Eg, you have the two tasks A,B with A a virtual ns
> > before avg (and per necessity the other 1 ns after). You run A until its
> > just past B, find its not longer eligible, switch to B and do the same.
> > This then results in max frequency context switches and minimal actual
> > progress.
> >
> > The thing that was supposed to stop this is vprot, but if you
> > consistently set vprot at avg_vruntime, this is effectively disabling
> > vprot. No?
>
> Yes, that's why it only happens when a shorter slice task is enqueued.
> Other tasks that will run before, should have a lag around their slice
> when this happens
> Note that I'm not using sched hrtick so once picked se will run for a
> tick (unless another wakeup happen)
>
> >
> > Now, the conditions for this are such that this only happens for all
> > tasks not of the minimal slice length in the tree. So in order words,
> > you get spikes of high frequency scheduling just to burn vtime in order
> > to achieve eligibility for the earliest min_slice task, right?
>
> Not sure what you mean by high frequency scheduling but each task
> should run once and just long enough to become ineligible or eligible
> but after the short slice task because of deadline update
>
> >
> > So what you really want is not avg_vruntime() but the actual
> > se->vruntime of this earliest min_slice entity. Then we can simply run
> > whatever task and not get hit with high frequency scheduling, and still
> > achieve minimal latency for the waiting task.
> >
> > Now, we don't actually have a convenient way to get this specific task,
> > but would something like so work?
> >
> > if (sched_feat(PREEMPT_SHORT) && slice != se->slice)
> > vprot = min_vruntime(vprot, __pick_root_entity(cfs_rq)->vruntime);
> >
> > That is, we protect until the next earliest task becomes eligible.
>
> I probably need to think a bit more about this but if you have several
> tasks eligible with very close vruntime, will not this make even
> smaller running step because __pick_root_entity(cfs_rq)->vruntime will
> be earlier than avg_vruntime().

When an entity with a shorter slice is enqueued, we want to set vprot
to a value that makes next task to run ineligible so when we
reschedule, we can move to the next entity in the rb_tree until the
entity with the shorter slice runs.
This means we want the average vruntime when next's key is 0 (slightly
above 0), not the current avg_vruntime.
I will try this

>
> >
> >
> > Or did I go off the rails somewhere?