Re: Re: [PATCH 1/4] sched/eevdf: Fix vruntime adjustment on reweight

From: Abel Wu
Date: Thu Feb 29 2024 - 09:27:06 EST


Hi Tianchen,

On 2/29/24 5:24 PM, Tianchen Ding Wrote:
Hi Abel:

I'm not so familiar with eevdf and still learning. Here I've some questions about this patch.

1. You did proof that V will not change during reweight (COROLLARY #2). However, according to the original paper, the new V will be:
V' = V + lag(j)/(W - w_j) - lag(j)/(W - w_j + w'_j)
So the V' in paper will change (when lag is not zero).
Is the V in Linux code slightly different from the paper?

Good catch. And to the best of my knowledge, the answer is YES. The
above Equation in the paper, which is Eq. (20), is based on the
assumption that:

"once client 3 leaves, the remaining two clients will
proportionally support the eventual loss or gain in the
service time" -- Page 10

"by updating the virtual time according to Eq. (18,19) we
ensure that the sum over the lags of all active clients
is always zero" -- Page 11

But in Peter's implementation, it is the competitors in the new group
that client 3 later joins in who actually support the effect. So when
client 3 leaves competition with !0-lag in Linux, the rq's sum(lag_i)
is no longer zero.


2. I print some log around reweight_entity(), mainly want to print V by calling avg_vruntime(cfs_rq). I found your algorithm only keeps the V unchanged during reweight_eevdf(), but not reweight_entity().

In detail:
If curr is true (i.e., cfs_rq->curr == se), we will directly run reweight_eevdf(), and the V is not changed.
If curr is false, we will have __dequeue_entity() -> reweight_eevdf() -> __enqueue_entity(). The V is finally changed due to dequeue and enqueue. So the result of reweight_entity() will be impacted by "cfs_rq->curr == se", is this expected?

Good catch again! It smells like a bug. Since this @se is still on_rq,
it should be taken into consideration when calculating avg_runtime(),
but in fact it isn't because __dequeue_entity() will remove its share.

And I seem to spot another bug, although not relate to this problem,
that we actually need to call update_curr() unconditionally if curr is
available, because we need to commit curr's outstanding runtime to
ensure the result of avg_runtime() is up to date.

Thanks & BR,
Abel