Re: [PATCH 06/17] sched/fair: Add lag based placement

From: Peter Zijlstra
Date: Wed Apr 05 2023 - 05:51:04 EST


On Mon, Apr 03, 2023 at 05:18:06PM +0800, Chen Yu wrote:
> On 2023-03-28 at 11:26:28 +0200, Peter Zijlstra wrote:
> > place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> [...]
> > /*
> > - * Halve their sleep time's effect, to allow
> > - * for a gentler effect of sleepers:
> > + * If we want to place a task and preserve lag, we have to
> > + * consider the effect of the new entity on the weighted
> > + * average and compensate for this, otherwise lag can quickly
> > + * evaporate:
> > + *
> > + * l_i = V - v_i <=> v_i = V - l_i
> > + *
> > + * V = v_avg = W*v_avg / W
> > + *
> > + * V' = (W*v_avg + w_i*v_i) / (W + w_i)
> If I understand correctly, V' means the avg_runtime if se_i is enqueued?
> Then,
>
> V = (\Sum w_j*v_j) / W

multiply by W on both sides to get:

V*W = \Sum w_j*v_j

> V' = (\Sum w_j*v_j + w_i*v_i) / (W + w_i)
>
> Not sure how W*v_avg equals to Sum w_j*v_j ?

V := v_avg

(yeah, I should clean up this stuff, already said to Josh I would)

> > + * = (W*v_avg + w_i(v_avg - l_i)) / (W + w_i)
> > + * = v_avg + w_i*l_i/(W + w_i)
> v_avg - w_i*l_i/(W + w_i) ?

Yup -- seems typing is hard :-)

> > + *
> > + * l_i' = V' - v_i = v_avg + w_i*l_i/(W + w_i) - (v_avg - l)
> > + * = l_i - w_i*l_i/(W + w_i)
> > + *
> > + * l_i = (W + w_i) * l_i' / W
> > */
> [...]
> > - if (sched_feat(GENTLE_FAIR_SLEEPERS))
> > - thresh >>= 1;
> > + load = cfs_rq->avg_load;
> > + if (curr && curr->on_rq)
> > + load += curr->load.weight;
> > +
> > + lag *= load + se->load.weight;
> > + if (WARN_ON_ONCE(!load))
> > + load = 1;
> > + lag = div_s64(lag, load);
> >
> Should we calculate
> l_i' = l_i * w / (W + w_i) instead of calculating l_i above? I thought we want to adjust
> the lag(before enqueue) based on the new weight(after enqueued)

We want to ensure the lag after placement is the lag we got before
dequeue.

I've updated the comment to read like so:

/*
* If we want to place a task and preserve lag, we have to
* consider the effect of the new entity on the weighted
* average and compensate for this, otherwise lag can quickly
* evaporate.
*
* Lag is defined as:
*
* l_i = V - v_i <=> v_i = V - l_i
*
* And we take V to be the weighted average of all v:
*
* V = (\Sum w_j*v_j) / W
*
* Where W is: \Sum w_j
*
* Then, the weighted average after adding an entity with lag
* l_i is given by:
*
* V' = (\Sum w_j*v_j + w_i*v_i) / (W + w_i)
* = (W*V + w_i*(V - l_i)) / (W + w_i)
* = (W*V + w_i*V - w_i*l_i) / (W + w_i)
* = (V*(W + w_i) - w_i*l) / (W + w_i)
* = V - w_i*l_i / (W + w_i)
*
* And the actual lag after adding an entity with l_i is:
*
* l'_i = V' - v_i
* = V - w_i*l_i / (W + w_i) - (V - l_i)
* = l_i - w_i*l_i / (W + w_i)
*
* Which is strictly less than l_i. So in order to preserve lag
* we should inflate the lag before placement such that the
* effective lag after placement comes out right.
*
* As such, invert the above relation for l'_i to get the l_i
* we need to use such that the lag after placement is the lag
* we computed before dequeue.
*
* l'_i = l_i - w_i*l_i / (W + w_i)
* = ((W + w_i)*l_i - w_i*l_i) / (W + w_i)
*
* (W + w_i)*l'_i = (W + w_i)*l_i - w_i*l_i
* = W*l_i
*
* l_i = (W + w_i)*l'_i / W
*/