Re: [PATCH 2/6 v2] sched/eevdf: Take into account current's lag when updating slice protection

From: Peter Zijlstra

Date: Tue Jun 16 2026 - 05:35:55 EST


On Mon, Jun 15, 2026 at 06:24:16PM +0200, Vincent Guittot wrote:
> Take into account the lag of current task when setting the slice protection
> in order to ensure that the absolute value of lags will remain in the
> range [0 : slice+tick]
> A task that already has a negative lag will see its protection reduced
> whereas a task with positive lag will keep a full slice protection.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> ---
> kernel/sched/fair.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 83bce5a04f3d..b8d5d9bcc014 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1083,6 +1083,7 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
> */
> static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> + u64 vruntime = min_vruntime(se->vruntime, avg_vruntime(cfs_rq));
> u64 slice = normalized_sysctl_sched_base_slice;
> u64 vprot = se->deadline;
>
> @@ -1090,8 +1091,8 @@ static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity
> slice = cfs_rq_min_slice(cfs_rq);
>
> slice = min(slice, se->slice);
> - if (slice != se->slice)
> - vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
> + if (vruntime != se->vruntime || slice != se->slice)
> + vprot = min_vruntime(vprot, vruntime + calc_delta_fair(slice, se));
>
> se->vprot = vprot;
> }

As already noted by Prateek, this doesn't seem to make much sense, since
we just got selected by schedule(), we *must* be left of avg_vruntime(),
otherwise we'd not be eligible and all that.

> @@ -1099,8 +1100,9 @@ static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity
> static inline void update_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> u64 slice = cfs_rq_min_slice(cfs_rq);
> + u64 vruntime = min_vruntime(se->vruntime, avg_vruntime(cfs_rq));
>
> - se->vprot = min_vruntime(se->vprot, se->vruntime + calc_delta_fair(slice, se));
> + se->vprot = min_vruntime(se->vprot, vruntime + calc_delta_fair(slice, se));
> }
>
> static inline bool protect_slice(struct sched_entity *se)

So:

- set_protect_slice() is called at: set_next_task(.first = true),
eg, only when the task gets scheduled().

- set_protect_slice() takes se->deadline as the baseline, and (when
RUN_TO_PARITY) computes a shorter vprot [ min_slice vs slice ].

- update_protect_slice() is called upon (failed) wakeup preemption,
new tasks have been added and as such the goal is to re-compute the
min_slice and possibly reduce vprot.

Right?


And while update_protect_slice() would ideally use the original
se->vruntime (as per set_next_task(.first = true) to compute any new
(shorter) vprot, per its use of min_vruntime() it can not in fact end up
with a vprot that is longer than the initial.

Now, your change is to use min(se->vruntime, avg_vruntime()) to increase
the chance of actually computing a shorter vprot. Still very much wrong,
but possibly less wrong.

Rather than taking avg_vruntime(), would it make sense to do something
like:

slice = cfs_rq_min_slice(cfs_rq);
slice -= se->sum_exec_runtime - se->prev_sum_exec_runtime;

vprot = se->vruntime;
if (slice < 0)
vprot -= calc_delta_fair(-slice, se);
else
vprot += calc_delta_fair(slice, se);

se->vprot = min_vruntime(se->vprot, vprot);

That is, reduce the slice with the time already ran.

Now, this will go sideways in the unlikely case of renice, and possibly
sched_change pattern (it seems we update prev_sum_exec_runtime for
!first), but overall it might be a better approximation, no?