Re: [PATCH 08/17] sched/fair: Implement an EEVDF like policy

From: Peter Zijlstra
Date: Wed Mar 29 2023 - 04:03:48 EST


On Tue, Mar 28, 2023 at 06:26:51PM -0700, Josh Don wrote:
> Hi Peter,
>
> This is a really interesting proposal and in general I think the
> incorporation of latency/deadline is quite a nice enhancement. We've
> struggled for a while to get better latency bounds on performance
> sensitive threads in the face of antagonism from overcommit.
>
> > void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > {
> > + s64 lag, limit;
> > +
> > SCHED_WARN_ON(!se->on_rq);
> > - se->vlag = avg_vruntime(cfs_rq) - se->vruntime;
> > + lag = avg_vruntime(cfs_rq) - se->vruntime;
> > +
> > + limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
> > + se->vlag = clamp(lag, -limit, limit);
>
> This is for dequeue; presumably you'd want to update the vlag at
> enqueue in case the average has moved again due to enqueue/dequeue of
> other entities?

Ha, just adding the entry back will shift the avgerage around and it's
all a giant pain in the backside.

place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
{
u64 vruntime = avg_vruntime(cfs_rq);
+ s64 lag = 0;

+ /*
+ * Due to how V is constructed as the weighted average of entities,
+ * adding tasks with positive lag, or removing tasks with negative lag
+ * will move 'time' backwards, this can screw around with the lag of
+ * other tasks.
+ *
+ * EEVDF: placement strategy #1 / #2
+ */
+ if (sched_feat(PLACE_LAG) && cfs_rq->nr_running > 1) {
+ struct sched_entity *curr = cfs_rq->curr;
+ unsigned long load;

+ lag = se->vlag;

/*
+ * If we want to place a task and preserve lag, we have to
+ * consider the effect of the new entity on the weighted
+ * average and compensate for this, otherwise lag can quickly
+ * evaporate:
+ *
+ * l_i = V - v_i <=> v_i = V - l_i
+ *
+ * V = v_avg = W*v_avg / W
+ *
+ * V' = (W*v_avg + w_i*v_i) / (W + w_i)
+ * = (W*v_avg + w_i(v_avg - l_i)) / (W + w_i)
+ * = v_avg + w_i*l_i/(W + w_i)
+ *
+ * l_i' = V' - v_i = v_avg + w_i*l_i/(W + w_i) - (v_avg - l)
+ * = l_i - w_i*l_i/(W + w_i)
+ *
+ * l_i = (W + w_i) * l_i' / W
*/
+ load = cfs_rq->avg_load;
+ if (curr && curr->on_rq)
+ load += curr->load.weight;
+
+ lag *= load + se->load.weight;
+ if (WARN_ON_ONCE(!load))
+ load = 1;
+ lag = div_s64(lag, load);

+ vruntime -= lag;
}


That ^ is the other side of it.

But yes, once enqueued, additional join/leave operations can/will shift
V around and lag changes, nothing much to do about that.

The paper does it all a wee bit differently, but I think it ends up
being the same. They explicitly track V (and shift it around on
join/leave) while I implicitly track it through the average and then
need to play games like the above, but in the end it should be the same.