Re: [PATCH 03/15] sched/fair: Add lag based placement

From: Benjamin Segall
Date: Thu Oct 12 2023 - 15:15:24 EST


Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:

> @@ -4853,49 +4872,119 @@ static void
> place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> {
> u64 vruntime = avg_vruntime(cfs_rq);
> + s64 lag = 0;
>
> - /* sleeps up to a single latency don't count. */
> - if (!initial) {
> - unsigned long thresh;
> + /*
> + * Due to how V is constructed as the weighted average of entities,
> + * adding tasks with positive lag, or removing tasks with negative lag
> + * will move 'time' backwards, this can screw around with the lag of
> + * other tasks.
> + *
> + * EEVDF: placement strategy #1 / #2
> + */

So the big problem with EEVDF #1 compared to #2/#3 and CFS (hacky though
it is) is that it creates a significant perverse incentive to yield or
spin until you see yourself be preempted, rather than just sleep (if you
have any competition on the cpu). If you go to sleep immediately after
doing work and happen to do so near the end of a slice (arguably what
you _want_ to have happen overall), then you have to pay that negative
lag in wakeup latency later, because it is maintained through any amount
of sleep. (#1 or similar is good for reweight/migrate of course)

#2 in theory could be abused by micro-sleeping right before you are
preempted, but that isn't something tasks can really predict, unlike
seeing more "don't go to sleep, just spin, the latency numbers are so
much better" nonsense.