Re: [RFC PATCH 0/6] sched: uclamp sum aggregation

From: Hongyan Xia
Date: Tue Dec 05 2023 - 12:24:06 EST

Next message: Namhyung Kim: "Re: [PATCH v1 1/2] perf metrics: Avoid segv if default metricgroup isn't set"
Previous message: Sean Christopherson: "Re: [PATCH v4 1/4] KVM: x86: refactor req_immediate_exit logic"
In reply to: Vincent Guittot: "Re: [RFC PATCH 0/6] sched: uclamp sum aggregation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 05/12/2023 16:26, Vincent Guittot wrote:

On Tue, 5 Dec 2023 at 16:19, Hongyan Xia <hongyan.xia2@xxxxxxx> wrote:

On 04/12/2023 16:12, Vincent Guittot wrote:

On Mon, 4 Dec 2023 at 02:48, Hongyan Xia <hongyan.xia2@xxxxxxx> wrote:

[...]

Other shortcomings are not that critical, but the fact that uclamp_min's
effectiveness is divided by N under max aggregation I think is not
acceptable.

Change EAS task placement policy in this case to take into account
actual utilization and uclamp_min/max

Thank you. I agree. I want to emphasize this specifically because this
is exactly what I'm trying to do. The whole series can be rephrased in a
different way:

- The PELT signal is distorted when uclamp is active.

Sorry but no it's not >> - Let's consider the [PELT, uclamp_min, uclamp_max] tuple.

That's what we are already doing with effective_cpu_util. We might
want to improve how we use them in EAS but that's another story than
your proposal

It's different. We never catch how we *got* the PELT value. If we wake up a task, what we do now is to have the following:

[p->util_avg, p->uclamp_min, p->uclamp_max, target_rq->uclamp_min, target_rq->uclamp_max]

But to best understand how big this task really was, we want:

[p->util_avg, previous_rq->uclamp_min_back_then, previous_rq->uclamp_max_back_then]

Without such information, issues cannot be avoided because we have no idea how big the task really was. Frequency spikes is just one of the symptoms when we mis-interpret how big the task was.

- Always carrying all three variables is too much, but [PELT,
clamped(PELT)] is an approximation that works really well.

As said before. It's a no go for this mix

I see your concern. To rephrase this series again, I'm simply arguing that

[p->util_avg, previous_uclamp_min, previous_uclamp_max]

is better than

[p->util_avg, p->uclamp_min, p->uclamp_max, target_rq->uclamp_min, target_rq->uclamp_max] plus the code to mitigate the issues

in estimating how big the task is.

I anticipate this series to be significantly smaller than the current max aggregation approach plus future code to mitigate the problems, but I'll keep trying to improve it to hopefully address your concerns.

Of course, I'll explore if there's a way to make things less messy. I
just realized why I didn't do things util_est way but instead directly
clamping on PELT, it's because util_est boosts util_avg and can't work
for uclamp_max. I'll keep exploring options.

[...]

Next message: Namhyung Kim: "Re: [PATCH v1 1/2] perf metrics: Avoid segv if default metricgroup isn't set"
Previous message: Sean Christopherson: "Re: [PATCH v4 1/4] KVM: x86: refactor req_immediate_exit logic"
In reply to: Vincent Guittot: "Re: [RFC PATCH 0/6] sched: uclamp sum aggregation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]